dvlab-research Step-DPO issues

dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

241 stars 6 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Ablation between DPO and Step-DPO

#20 tqzhong opened 3 days ago
0
Does step-dpo work?

#19 hxdtest opened 1 week ago
0
question about StepDPOTrainer

#18 FlyingDutchman26 closed 3 weeks ago
1
eval_math:143 prompt_answer = remove_text(prompt_answer)

#17 Xalp opened 4 weeks ago
0
I followed the steps in the README file to train the model, but I got an error. Here is the error message.

#16 Claude121381011 opened 1 month ago
0
question about Data Construction

#15 hong-xl opened 1 month ago
1
Evaluation scripts for AIME and Odyssey-MATH

#14 bmanczak opened 1 month ago
0
deepseek-math-7b-rl-stepdpo推理后的结果问题

#13 wjn1996 opened 1 month ago
1
validation set

#12 kaishxu opened 2 months ago
0
appendix missing

#11 ChrisMii opened 2 months ago
1
During DPO training, will SFT loss be calculated?

#10 mohhao opened 2 months ago
0
question about Data Construction Pipeline

#9 yyht opened 2 months ago
0
questions about some parameter in config_full.yaml

#8 kaishxu closed 2 months ago
1
Question about the DPO vs. Step-DPO.

#7 flow3rdown closed 2 months ago
2
Great work， what about the computation resources needed for each experiment

#6 yanghu819 closed 3 months ago
4
Request for Citation

#5 hbin0701 closed 3 months ago
1
Data Generation Pipeline

#4 yapdianang closed 2 months ago
2
About details of Step localization and Rectification

#3 ToheartZhang closed 3 months ago
1
复现问题

#2 yyht closed 3 months ago
8
share sft-dataset

#1 yyht opened 3 months ago
4