eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)
Apache License 2.0
2.18k stars 180 forks source link

Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks #80

Open Jayant1234 opened 6 months ago

Jayant1234 commented 6 months ago

In both the Trainers, Basic, and FSDP, there is an underlying pattern of GPU memory not being freed. Allocation keeps increasing in steps while utilization remains roughly constant. image

Does anyone have any suggestions of what might have gone wrong?