Address Incorrect Ignoring of Base Layer Modules for FSDP with Kernels

This PR addresses that when we use lora_adapters_switch_ddp_from_fsdp to ignore the lora modules, we have previously also ignored the base layers

This fix properly addresses it, by ignoring only the LoRA modules

for auto_gptq we found that it works well
for bnb, this will cause the quant_state on the parameter to be destroyed, to address this, we now get the quant_state from the base_layer, thus also addressing https://github.com/foundation-model-stack/fms-acceleration/issues/3

Tests

Before Fix: No Sharding of Attention Base Layer	Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput per device (toks/sec)
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq	1	4	455	36.3
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq	2	4	445	18.1
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	1	4	497	36.3
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	2	4	476	15.1

After Fix: Attention Base Layer Sharded	Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput (toks/sec)	Torch Memory Allocated (GiB)
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	1	4	501	36.3
TheBloke/Llama-2-70B-GPTQ	accelerated-peft-gptq-foak	2	4	497	18.1

Before Fix

Nothing runs for 2 GPUS due to #3

After Fix

QLoRA-FOAK is compatible with FSDP and there is roughly a 10% increase in speed from applying FOAK	Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput (toks/sec)	Torch Memory Allocated (GiB)	Average Loss
NousResearch/Llama-2-70b-hf	accelerated-peft-bnb	2	4	441	19.2	0.922
NousResearch/Llama-2-70b-hf	accelerated-peft-bnb-foak	2	4	485	19.2	0.922

Llama3

Model Name	Framework Config	No. GPU	Per Dev. Batch Size	Throughput (toks/sec)	Torch Memory Allocated (GiB)	Average Loss
Meta-Llama3	accelerated-peft-bnb	2	2	398	20.9	0.922
Meta-Llama3	accelerated-peft-bnb-foak	2	2	434	20.9	0.922
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ	accelerated-peft-autogptq	2	2	407	21.1	1.06
TechxGenus/Meta-Llama-3-70B-Instruct-GPTQ	accelerated-peft-autogptq-foak	2	2	448	21.1	1.06