MARIO-Math-Reasoning / Super_MARIO

MIT License
254 stars 16 forks source link

How to set B1 in Step level Beam Search #7

Closed xiaolizh1 closed 6 months ago

xiaolizh1 commented 6 months ago

Sorry to bother you, may I ask how to set B1 in Step level Beam Search, or change step_beam_width in sbs_sft.yaml? Setting step_beam_width doesn't seem to have much effect.

lovecambi commented 6 months ago

Sorry to bother you, may I ask how to set B1 in Step level Beam Search, or change step_beam_width in sbs_sft.yaml? Setting step_beam_width doesn't seem to have much effect.

Yes. B1 is step_beam_width. We observed that B1 from 1 to 3 has significant effect, but if continue to increase e.g. 4 or 5, there is incremental improvement.

xiaolizh1 commented 6 months ago

Sorry to bother you, may I ask how to set B1 in Step level Beam Search, or change step_beam_width in sbs_sft.yaml? Setting step_beam_width doesn't seem to have much effect.

Yes. B1 is step_beam_width. We observed that B1 from 1 to 3 has significant effect, but if continue to increase e.g. 4 or 5, there is incremental improvement.

Another question, where can I download the data used to train the value model in the paper?

lovecambi commented 6 months ago

Sorry to bother you, may I ask how to set B1 in Step level Beam Search, or change step_beam_width in sbs_sft.yaml? Setting step_beam_width doesn't seem to have much effect.

Yes. B1 is step_beam_width. We observed that B1 from 1 to 3 has significant effect, but if continue to increase e.g. 4 or 5, there is incremental improvement.

Another question, where can I download the data used to train the value model in the paper?

Our work didn't have annotated training data. It was automatically generated, and you can refer to training data generation in readme. You can run the corresponding script to generate MCTS data for each round.

Particularly, for each question, we generate 4 to 10 trees, and sampled at most 4 correct solution paths and 4 incorrect solution paths.