issues
search
dvlab-research
/
Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
241
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Ablation between DPO and Step-DPO
#20
tqzhong
opened
3 days ago
0
Does step-dpo work?
#19
hxdtest
opened
1 week ago
0
question about StepDPOTrainer
#18
FlyingDutchman26
closed
3 weeks ago
1
eval_math:143 prompt_answer = remove_text(prompt_answer)
#17
Xalp
opened
4 weeks ago
0
I followed the steps in the README file to train the model, but I got an error. Here is the error message.
#16
Claude121381011
opened
1 month ago
0
question about Data Construction
#15
hong-xl
opened
1 month ago
1
Evaluation scripts for AIME and Odyssey-MATH
#14
bmanczak
opened
1 month ago
0
deepseek-math-7b-rl-stepdpo推理后的结果问题
#13
wjn1996
opened
1 month ago
1
validation set
#12
kaishxu
opened
2 months ago
0
appendix missing
#11
ChrisMii
opened
2 months ago
1
During DPO training, will SFT loss be calculated?
#10
mohhao
opened
2 months ago
0
question about Data Construction Pipeline
#9
yyht
opened
2 months ago
0
questions about some parameter in config_full.yaml
#8
kaishxu
closed
2 months ago
1
Question about the DPO vs. Step-DPO.
#7
flow3rdown
closed
2 months ago
2
Great work, what about the computation resources needed for each experiment
#6
yanghu819
closed
3 months ago
4
Request for Citation
#5
hbin0701
closed
3 months ago
1
Data Generation Pipeline
#4
yapdianang
closed
2 months ago
2
About details of Step localization and Rectification
#3
ToheartZhang
closed
3 months ago
1
复现问题
#2
yyht
closed
3 months ago
8
share sft-dataset
#1
yyht
opened
3 months ago
4