Closed fabianlim closed 3 weeks ago
This PR addresses that when we use lora_adapters_switch_ddp_from_fsdp to ignore the lora modules, we have previously also ignored the base layers
lora_adapters_switch_ddp_from_fsdp
This fix properly addresses it, by ignoring only the LoRA modules
auto_gptq
bnb
quant_state
Before Fix
Nothing runs for 2 GPUS due to #3
After Fix
Llama3
This PR addresses that when we use
lora_adapters_switch_ddp_from_fsdp
to ignore the lora modules, we have previously also ignored the base layersThis fix properly addresses it, by ignoring only the LoRA modules
auto_gptq
we found that it works wellbnb
, this will cause thequant_state
on the parameter to be destroyed, to address this, we now get thequant_state
from the base_layer, thus also addressing https://github.com/foundation-model-stack/fms-acceleration/issues/3Tests
General Benchmarks
Config
per
device
(toks/sec)
Memory
Allocated (GiB)
Config
(toks/sec)
Memory
Allocated (GiB)
Before Fix
Nothing runs for 2 GPUS due to #3
After Fix
Config
(toks/sec)
Memory
Allocated (GiB)
Loss
Llama3
Config
(toks/sec)
Memory
Allocated (GiB)
Loss