dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
241 stars 6 forks source link

Data Generation Pipeline #4

Closed yapdianang closed 2 months ago

yapdianang commented 3 months ago

Hi, will the data generation pipeline be released to check for reproducibility? Thanks!

X-Lai commented 3 months ago

We will release the scripts for data generation with GPT-4o soon. Please stay tuned.

X-Lai commented 2 months ago

Hi, we have released the scripts for constructing the Step-DPO data: https://github.com/dvlab-research/Step-DPO/tree/main?tab=readme-ov-file#data-construction-pipeline. Welcome to generating data on your own!