I implemented qlora adapter to use with LLMs (currently bloom-560m). It works fine so far, after fine-tuning I get over 90% accuracy on my task. However, after saving and loading the adapter, the accuracy drops down to slightly above guessing.
I assume it is connected to the following behaviour, but I am clueless why this happens:
At creation of the adapter during fine-tuning the function model.print_trainable_parameters() prints:
trainable params: 788,480 || all params: 560,005,120 || trainable%: 0.1407987126974839.
After saving and loading the adapter the same function results in:
trainable params: 2,048 || all params: 560,005,120 || trainable%: 0.00036571094207138677.
If I save the loaded adapter again (without further fine-tuning), it stays at this level after loading it again.
Many thanks in advance,
Christian
Here is my code for creating the adapter...
model_name_or_path = 'bigscience/bloom-560m'
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
# Get model configuration.
model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_name_or_path, num_labels=n_labels)
# Get model's tokenizer.
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path)
# default to left padding
tokenizer.padding_side = "left"
# Define PAD Token = EOS Token = 50256
tokenizer.pad_token = tokenizer.eos_token
# Get the actual model.
model = AutoModelForSequenceClassification.from_pretrained(
pretrained_model_name_or_path=model_name_or_path,
config=model_config,
torch_dtype=torch.bfloat16,
device_map={"": 0},
load_in_4bit=True,
quantization_config= bnb_config
)
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
#PEFT
print("loading to LoRA")
peft_config = LoraConfig(task_type="SEQ_CLS", inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# resize model embedding to match new tokenizer and fix model padding token id
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = model.config.eos_token_id
model.to(device)
...and saving it.
global model
model.save_pretrained("adapters/lora_adapte_v2")
And here is the code for loading it again:
model_name_or_path = 'bigscience/bloom-560m'
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_name_or_path, num_labels=n_labels)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path)
# default to left padding
tokenizer.padding_side = "left"
# Define PAD Token = EOS Token = 50256
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForSequenceClassification.from_pretrained(
pretrained_model_name_or_path=model_name_or_path,
config=model_config,
torch_dtype=torch.bfloat16,
device_map={"": 0},
load_in_4bit=True,
quantization_config= bnb_config
)
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
adapter = load_lora_name
model = PeftModel.from_pretrained(model, adapter)
model.print_trainable_parameters()
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = model.config.eos_token_id
model.to(device)
print('Model loaded to `%s`'%device)
Hello, my situation is as follows:
I implemented qlora adapter to use with LLMs (currently bloom-560m). It works fine so far, after fine-tuning I get over 90% accuracy on my task. However, after saving and loading the adapter, the accuracy drops down to slightly above guessing.
I assume it is connected to the following behaviour, but I am clueless why this happens: At creation of the adapter during fine-tuning the function
model.print_trainable_parameters()
prints: trainable params: 788,480 || all params: 560,005,120 || trainable%: 0.1407987126974839. After saving and loading the adapter the same function results in: trainable params: 2,048 || all params: 560,005,120 || trainable%: 0.00036571094207138677. If I save the loaded adapter again (without further fine-tuning), it stays at this level after loading it again.Many thanks in advance, Christian
Here is my code for creating the adapter...
...and saving it.
And here is the code for loading it again: