dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
241 stars 6 forks source link

questions about some parameter in config_full.yaml #8

Closed kaishxu closed 2 months ago

kaishxu commented 2 months ago
Screenshot 2024-07-15 at 20 33 30

Hello!

I am confused about the parameters "model_name_or_path:" and "data_path:". These paths are missing in the repo. So, do they play a role in training?

X-Lai commented 2 months ago

Sorry about that. I forgot to clean these paths before releasing the code. Now I have fixed it.

Then, you should reset them to your desired paths in the training command.