Closed liminghao0914 closed 10 months ago
Thanks for your interest in our work.
For the problem of NaN
, I guess it is caused by float16
training process of the grm. I think a potential solution is to change the hyper-parameters, like setting a smaller learning rate.
For the training datasets, we will consider whether to release them.
Hope my answer can help you.
TY, I'll try it.
Hi Timothy,
Thanks for sharing your impressive work. Besides some minor bugs in your code, I'm facing a major obstacle while generating training data for rlmec after rewriting the generated samples by a well-trained grm.
I followed the steps in your readme and eventually found that the rewards in generated training data are all
Nan
. One example item of therlmec_qa.jsonl
would be likeI think it was abnormal. I used
vicuna-7b-1.5
andgpt-4
as the base model and teacher model to reproduce the process and reduced the teacher model data to512
for debugging (also for saving tokens xD). I didn't change the parameters in those shell scripts. Instead of usingtorchrun
for data parallel andbf16
, I chose model parallel to train the grm withfloat16
on 4 V100 (32G).Is it related to the trained grm caused by the small teacher model dataset? I would greatly appreciate it if you could share some datasets at your convenience.
Thanks again for your open-sourcing. Looking forward to your prompt reply.