foundation-model-stack / fms-acceleration

🚀 Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.
Apache License 2.0
0 stars 4 forks source link

Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels #31

Closed fabianlim closed 3 weeks ago

fabianlim commented 3 weeks ago

This PR addresses that when we use lora_adapters_switch_ddp_from_fsdp to ignore the lora modules, we have previously also ignored the base layers

This fix properly addresses it, by ignoring only the LoRA modules

Tests

General Benchmarks

Before Fix: No Sharding of Attention Base Layer Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
per
device
(toks/sec)
Torch
Memory
Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq 1 4 455 36.3
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq 2 4 445 18.1
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 1 4 497 36.3
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 2 4 476 15.1
After Fix: Attention Base Layer Sharded Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
(toks/sec)
Torch
Memory
Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 1 4 501 36.3
TheBloke/Llama-2-70B-GPTQ accelerated-peft-gptq-foak 2 4 497 18.1

Before Fix

Nothing runs for 2 GPUS due to #3

After Fix

QLoRA-FOAK is compatible with FSDP and there is roughly a 10% increase in speed from applying FOAK Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
(toks/sec)
Torch
Memory
Allocated (GiB)
Average
Loss
NousResearch/Llama-2-70b-hf accelerated-peft-bnb 2 4 441 19.2 0.922
NousResearch/Llama-2-70b-hf accelerated-peft-bnb-foak 2 4 485 19.2 0.922

Llama3

Model Name Framework
Config
No. GPU Per Dev. Batch Size Throughput
(toks/sec)
Torch
Memory
Allocated (GiB)
Average
Loss
Meta-Llama3 accelerated-peft-bnb 2 2 398 20.9 0.922
Meta-Llama3 accelerated-peft-bnb-foak 2 2 434 20.9 0.922
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ accelerated-peft-autogptq 2 2 407 21.1 1.06
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ accelerated-peft-autogptq-foak 2 2 448 21.1 1.06