OFA-Sys / gsm8k-ScRel

Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
https://arxiv.org/abs/2308.01825
212 stars 16 forks source link

Questions about RFT Inference #6

Closed waterhorse1 closed 1 year ago

waterhorse1 commented 1 year ago

Thanks for this great work. I have two questions: the first one is that the generation code for 7b/13b seems to be missing. The second is about the specific hyperparameter settings. The default hyperparameters set in single_inference_30b.py are not reasonable for generating different reasoning paths.

Thank you for your help!

GanjinZero commented 1 year ago

You want to check group_7b_13b.sh. We have discussed in the paper, if you use temp=0.7 for 33b, you will generate like 2 different paths for 100 sampling times. If you use temp=1.0, you will have 4 different paths for 100 sampling times.

GanjinZero commented 1 year ago

I will upload gen_train.sh later.

waterhorse1 commented 1 year ago

@GanjinZero What kind of decoding strategy are you using, direct sampling or beam search?

GanjinZero commented 1 year ago

sampling

waterhorse1 commented 1 year ago

thanks for your answer! I will close the issue.