Open PKUfreshman opened 1 month ago
zhangdan0602 +1 Would appreciate if you could add some instructions to train the models.
We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR
is the local path to the value model. Considering the different dependency versions of transformers
, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.
In addition, you can download [$D_{V_0}$] and put them in PRM/data
to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT
.
We also provide PRM/train_VM_chatglm.py
and PRM/train_VM_mistral.py
.
We have updated README.md to illustrate the details. Specifically,
VALUE_BASE_MODEL_DIR
is the local path to the value model. Considering the different dependency versions oftransformers
, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.In addition, you can download [$D_{V_0}$] and put them in
PRM/data
to train Mistral-7B as the initial process reward model and obtainVALUE_MODEL_STATE_DICT
. We also providePRM/train_VM_chatglm.py
andPRM/train_VM_mistral.py
.
Thanks!
Sorry to interrupt! I really appreciate your work, but I can't do either inference or self-training based on README. For inference, I followed the README but failed to run evaluate.py. What are VALUE_BASE_MODEL_DIR and VALUE_MODEL_STATE_DICT? What's more, the model you release on HF, zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st, seems to have some problems. I've tried many times, but it reports an error when loading the checkpoint shard: safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer Someone also raised this question on HF.
For training, the README introduces nothing about it.
Hope for an update for your README and maybe double-check your HF model. Thank you very much!