eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)
Apache License 2.0
2.06k stars 167 forks source link

Training process got stuck when loss=dpo sample_during_eval=true trainer=FSDPTrainer #85

Closed kygguo closed 2 months ago

kygguo commented 2 months ago

Thanks for sharing this repo.

When I run dpo with FSDPTrainer and sample_during_eval, it just got stuck with the following info:

Processing HH: 100%|█████████████████| 160800/160800 [00:04<00:00, 39073.07it/s] Running evaluation after 0 train examples Computing eval metrics: 100%|█████████████████████| 8/8 [00:04<00:00, 1.71it/s] Warning: n_eval_model_samples (16) < eval_batch_size (32). Sampling from the first complete eval batch of prompts. Generating samples...: 0%| | 0/1 [00:00<?, ?it/s]

Concretely, it got stuck when doing model.generate(). Several issues report some thing wrong with FSDP + HuggingFace generate (also mentioned in your code annotation: https://github.com/pytorch/pytorch/issues/100069). I want to check whether you came across this situation and how to handel it?

Looking forward to your response, thanks in advance!

kygguo commented 2 months ago

Just found the warning info in readme... It seems no tractable solution to handle this, so I closed this issue.