SihengLi99 / SEALONG

Large Language Models Can Self-Improve in Long-context Reasoning
21 stars 0 forks source link

Advantages of MBR over Oracle #1

Open dle666 opened 3 hours ago

dle666 commented 3 hours ago

This is an inspiring work and very interesting! As I read through the paper, I had a few questions. In Figure 1, MBR seems to be less effective than Oracle, and I'm curious about its specific advantages. In the training phase, it also seems feasible to use majority voting as a positive example and treat the remaining cases as random counterexamples? I would appreciate any clarification on this. Thanks!

SihengLi99 commented 2 hours ago

Thank you very much for your interest in our work! We sincerely appreciate your thoughtful questions and have provided the following explanations. We will also incorporate your feedback into the next version of our work for further improvement.

  1. In Figure 1, the significant differences between MBR and Oracle are understandable, as Oracle directly leverages ground-truth information, which MBR does not. However, it is worth noting that despite these differences, MBR significantly outperforms Greedy Search. This demonstrates that the data sampled via MBR is better than what the current model can express, thereby improving model performance. Moreover, leveraging MBR scores enables Preference Optimization, allowing the model to learn from negative samples. You can refer to the comparison between SEALONG-SFT and SEALONG in Table 5, which further highlights the effectiveness of Preference Optimization.

  2. The idea of Majority Voting that you mentioned is entirely feasible. In fact, Meta’s recent work, released last week (https://arxiv.org/abs/2411.04109), employs a similar approach. Additionally, the work by UIUC and Google in 2022 (https://arxiv.org/pdf/2210.11610) also follows a comparable idea. These methods are discussed in the Related Work section. Fundamentally, these approaches aim to evaluate responses by leveraging consensus across multiple outputs, with differences mainly in their implementation details.

  3. We did not adopt Majority Voting in our work for two reasons. First, we formalized the problem as Minimum Bayes Risk, which might provide stronger theoretical support. Second, in practical scenarios, especially for long-context scenarios, models may struggle to adhere strictly to the required output format, making it difficult to extract the corresponding answer from each output reliably. Nonetheless, this is a minor detail of implementation and does not undermine the potential effectiveness of Majority Voting.

Once again, we appreciate your feedback and the opportunity to clarify these points. Please feel free to reach out if you have any further questions or suggestions!