Open ehartford opened 1 year ago
What model / other parameters are you using for torch run? I personally try to stay away from torchrun and use accelerate instead.
I'm having good success using this fork: https://github.com/ChrisHayduk/qlora-multi-gpu/
Models have about even vram usage 40.97GB
v 40.95GB
hello when I train with multi gpu like this
Then I get uneven VRAM utilization:
This means, that I have to use a smaller batch size than I otherwise could, which causes my build to take 30% longer than it should.
I don't have this problem when doing multi-gpu build in full-weights (non-qlora) using accelerate or deepspeed.