Add `no_sync` support, fix gradient accumulation, and logging and argument improvements

AnswerDotAI / fsdp_qlora

Training LLMs with QLoRA + FSDP

Apache License 2.0

1.42k stars 188 forks source link

Add `no_sync` support, fix gradient accumulation, and logging and argument improvements #6

Closed warner-benjamin closed 9 months ago

warner-benjamin commented 9 months ago

This PR adds FSDP no_sync support, which doesn't synchronize gradients until the gradient accumulation step. It also fixes gradient accumulation by truncating the dataset length and correcting the modulus comparison. It adds improved logging compatibility with tqdm and updates the readme and arguments.

KeremTurgutlu commented 9 months ago

LGTM