Open jt4n opened 3 weeks ago
Maybe you should check the training sequence after your preprocess_value_dataset_instruct
.
We tried and can not fix the problem. So we use the same data and training script to train Llama3-8B-Base
. But we found that the sft model output all Q value as -1.0
.
Hi, I’d like to ask for some advice on training the instruct model.
In your code, you used vanilla template to train a base model (deepseek-math-7b-base), so there's no need to apply the role-playing chat template in data preprocessing.
If we need to train an instruct model, e.g. llama3-8b-instruct, we need to apply the llama3 template. So we need to encode the user_message and assistant_message in oneturn, and the "Observation" part should be placed in user_message.
I tried to modified the
preprocess_value_dataset
to support this feature, but the model response seems to repeat continuously.