Allow BNB Plugin to be Loaded Without PEFT Wrapping

This issue regards to the warnings that for QLoRA PeFT we should pass peft_config directly to SFTTrainer

if no_peft_model: True then model_loader will simply load the BNB model, and some logic needed to set requires_agumentation: False
peft wrapping will be delegated to trf.SFTTrainer
add one test for this new flag in tests/test_peft_plugin

in generate sample configurations update the CONFIGURATIONS and COMBINATIONS:

CONFIGURATIONS = {
  KEY_AUTO_GPTQ: "plugins/accelerated-peft/configs/autogptq.yaml",
  KEY_BNB_NF4: (
      "plugins/accelerated-peft/configs/bnb.yaml",
      [("peft.quantization.bitsandbytes.quant_type", "nf4")],
  ),
 KEY_BNB_NF4_BASELINE: (
      "plugins/accelerated-peft/configs/bnb.yaml",
      [
          ("peft.quantization.bitsandbytes.quant_type", "nf4"), 
         ("peft.quantization.bitsandbytes.no_peft_model", True), 
       ],
  ),
}

COMBINATIONS = [
  ("accelerated-peft-autogptq", (KEY_AUTO_GPTQ,)),
  ("accelerated-peft-bnb-nf4", (KEY_BNB_NF4,)),
  ("baseline-bnb-nf4", (KEY_BNB_NF4_BASELINE,)),
]

regenerate the benches, update the CSV and the README

configs/bnb.yaml

add a new flag no_peft_model


# PEFT-related acceleration
peft:

  # quantization-releated acceleration
  # e.g., kernels for quantized base weights
  quantization: 

    # For loading BitsAndBytes quantized layers
    # to serve as 4bit base-weights for LoRA PEFT-tuning.
    # NOTE: currently AutoGPTQ is not properly integrated into huggingface /
    # bitsandbytes, thus recommended quant_type to be either "nf4"
    # or "fp4".
    # bitsandbytes:
    bitsandbytes:
      quant_type: nf4 

      # If True, then no get_peft_model and prepare_model_for_kbit_training
      # will be called. 
      no_peft_model: False

foundation-model-stack / fms-acceleration

Allow BNB Plugin to be Loaded Without PEFT Wrapping #10

configs/bnb.yaml