dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
173 stars 4 forks source link

share sft-dataset #1

Open yyht opened 1 month ago

yyht commented 1 month ago

hello, nice work. could share the sft-dataset in hf?

X-Lai commented 1 month ago

Sure, it will be released soon. Please stay tuned.

yapdianang commented 2 weeks ago

Hi authors, following up on this thread to stay updated when the SFT datasets are released. Thanks and nice work!