Open RishabhMaheshwary opened 1 month ago
@RishabhMaheshwary how many gpus and what gpu type? Thanks
@winglian 8GPUs, A100 80GB.
I am able to run without any errors when I use the examples/mistral/config.yml
. But when I just replace the dataset and training method to dpo shown below it gives the above error.
rl: dpo
datasets:
- path: Intel/orca_dpo_pairs
split: train
type: chatml.intel
It might be related to trl?
@RishabhMaheshwary I've narrowed this down to an issue with DPO full finetuning. DPO LoRA doesn't exhibit the same error.
@RishabhMaheshwary a workaround for now is to append this in your launch command --dataset_processes=1
e.g.: accelerate launch --use_deepspeed -m axolotl.cli.train ../examples/mistral/config.yml --dataset_processes=1
Thanks a lot! Will give it a try and let you know.
There should be a fix upstream in trl too to fix this soon.
Please check that this issue hasn't been reported before.
Expected Behavior
It should run without any errors.
Current behaviour
Throws the error:
Steps to reproduce
The latest pull with commit id
219cd0d
with the following command and config below results in the error belowaccelerate launch --use_deepspeed -m axolotl.cli.train ../examples/mistral/config.yml
Error:
Config yaml