axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.48k stars 808 forks source link

fixes to prevent vram spike when train starts #1742

Closed winglian closed 1 month ago

winglian commented 1 month ago

Fixes #1717

winglian commented 1 month ago
Screenshot 2024-07-13 at 9 45 31 AM

Can confirm this trains on 2x4090s. Needs somewhere between 125GB-250GB system/CPU RAM. it crashed with exitcode -9 @ 125GB RAM.

Nero10578 commented 1 month ago

Screenshot 2024-07-13 at 9 45 31 AM Can confirm this trains on 2x4090s. Needs somewhere between 125GB-250GB system/CPU RAM. it crashed with exitcode -9 @ 125GB RAM.

Was this using the example Lllama 3 70B Qlora FSDP config file?