Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
7.95k stars 797 forks source link

Loading fine tuned model #855

Closed Chasapas closed 2 months ago

Chasapas commented 5 months ago

Hello, I was able to fine-tune Falcon 7B with the help of lit-gpt. I also merged the weights and converted to .ckpt file. thanks Can i now load the model by using AutoModelForCausalLM.from_pretrained for inference?

Ps: Do i have to convert it to another type?

Andrei-Aksionov commented 5 months ago

Hello @Chasapas

Yes, if you converted the weights into Huggingface format, you can use it with transformers library. In this comment you can find a code snippet.

Chasapas commented 5 months ago

Hello @Andrei-Aksionov thank you for your time I have already tried this but this block downloading checkpoint shards from the base model it stores them in the GPU memory then it loads my checkpoints from the local .cpkt i have created, if im not wrong. Except that im running out of GPU memory with this process (16GB), i dont understand why this is necessary since I merged the lora weights with the base model before the conversion to hf.

Andrei-Aksionov commented 5 months ago

Yes, that's a good point. The easiest way to get rid of unnecessary VRAM usage, is to use state_dict argument for .from_pretrained method. From the docs:

state_dict (Dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file. This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.

Quickly checked it on my laptop and looks like it works. Haven't measured VRAM consumption though. So I rely on your feedback here.


If the above won't work, you can also try to use meta device. Here you can find more info.

Chasapas commented 5 months ago

Hello @Andrei-Aksionov m i did like this checkpoint_file = '/path/fine-tuned.ckpt' base_model_name = "tiiuae/falcon-7b-instruct" model = AutoModelForCausalLM.from_pretrained(base_model_name, load_in_4bit=True, device_map='auto', offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16, state_dict=checkpoint_file) The base shards are loaded : Loading checkpoint shards: 100% but im not sure if my fine tuned checkpoint are also loaded, as i can see 4GB gpu memory usage. I dont know if it is possible to "create" a new model from the base model and the fine tuned checkpoint and load it.

Chasapas commented 5 months ago

I also noticed that the results after lora tuning are exactly the same with the base model. Any idea?

Andrei-Aksionov commented 5 months ago

Docs say state_dict (Dict[str, torch.Tensor], optional), so it expects a dictionary and you provided a string. So the low VRAM usage might be explained by the fact the weight weren't loaded. Surprised that it didn't raise an error. So it should be something like this:

checkpoint_file = '/path/fine-tuned.ckpt'
base_model_name = "tiiuae/falcon-7b-instruct"
model = AutoModelForCausalLM.from_pretrained(base_model_name, state_dict = torch.load(checkpoint_file))

I also noticed that the results after lora tuning are exactly the same with the base model.

You mean that the generated output is identical for base model and merged base model with LoRA weights?

Chasapas commented 5 months ago

checkpoint_file = '/path/fine-tuned.ckpt' base_model_name = "tiiuae/falcon-7b-instruct" model = AutoModelForCausalLM.from_pretrained(base_model_name, state_dict = torch.load(checkpoint_file))

Just tried this and resulted in Process finished with exit code 137 (interrupted by signal 9:SIGKILL) as the base model requires 4GB in 4bit loading and the checkpoint is 14GB so in total 18GB and my T4 is 16GB.So i cant load both or i don't know the way to do it.

You mean that the generated output is identical for base model and merged base model with LoRA weights?

I mean that the response below in my instruction is exactly the same as in the base model. And the instruction is in the dataset i used for the lora tuning. So im not even sure if i tuned anything.

Seed set to 1337
{'eval_interval': 100, 'save_interval': 100, 'eval_iters': 100, 'eval_max_new_tokens': 100, 'log_interval': 1, 'devices': 1, 'learning_rate': 0.0001, 'batch_size': 256, 'micro_batch_size': 3, 'gradient_accumulation_iters': 85, 'max_seq_length': 250, 'max_iters': 100, 'weight_decay': 0.01, 'lora_r': 4, 'lora_alpha': 8, 'lora_dropout': 0.05, 'lora_query': True, 'lora_key': False, 'lora_value': True, 'lora_projection': False, 'lora_mlp': False, 'lora_head': False, 'warmup_steps': 100}
Loading model path/lit-gpt/checkpoints/tiiuae/falcon-7b-instruct/lit_model.pth' with {'name': 'falcon-7b-instruct', 'hf_config': {'org': 'tiiuae', 'name': 'falcon-7b-instruct'}, 'block_size': 2048, 'vocab_size': 65024, 'padding_multiple': 512, 'padded_vocab_size': 65024, 'n_layer': 32, 'n_head': 71, 'n_embd': 4544, 'rotary_percentage': 1.0, 'parallel_residual': True, 'bias': False, 'lm_head_bias': False, 'n_query_groups': 1, 'shared_attention_norm': True, '_norm_class': 'LayerNorm', 'norm_eps': 1e-05, '_mlp_class': 'GptNeoxMLP', 'gelu_approximate': 'none', 'intermediate_size': 18176, 'rope_condense_ratio': 1, 'rope_base': 10000, 'n_expert': 0, 'n_expert_per_token': 0, 'r': 4, 'alpha': 8, 'dropout': 0.05, 'to_query': True, 'to_key': False, 'to_value': True, 'to_projection': False, 'to_mlp': False, 'to_head': False, 'head_size': 64, 'rope_n_elem': 64}
Number of trainable parameters: 1,753,088
Number of non trainable parameters: 7,217,189,760
Seed set to 1337
The longest sequence length in the train data is 183, the model's maximum sequence length is 183 and context length is 2048
Validating ...
What do the default values in the CBAM Regulation signify and how are they applied?
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
**What do the default values in the CBAM Regulation signify and how are they applied?**

### Response:
**The default values in the CBAM Regulation represent the minimum acceptable level of management controls and business processes required to ensure the organization's ability to achieve its objectives and comply with applicable laws and regulations. These default values are applied to all CBAM-controlled entities to guide and ensure implementation of the necessary management controls and business processes.**

iter 1 step 0: loss 3.3856, iter time: 1288.77ms
iter 2 step 0: loss 3.4340, iter time: 1038.95ms
iter 3 step 0: loss 3.5436, iter time: 802.51ms
iter 4 step 0: loss 3.1400, iter time: 762.56ms
iter 5 step 0: loss 3.4491, iter time: 1135.74ms
iter 6 step 0: loss 3.6019, iter time: 772.42ms
iter 7 step 0: loss 3.2783, iter time: 761.54ms
iter 8 step 0: loss 3.6873, iter time: 930.91ms
iter 9 step 0: loss 3.5944, iter time: 745.20ms
iter 10 step 0: loss 3.8152, iter time: 931.94ms
..
iter 100 step 1: loss 3.7119, iter time: 810.90ms
Training time: 96.60s
Memory used: 8.53 GB
Saving LoRA weights to 'path/lit_model_lora_finetuned.pth'

Again thank you for your help @Andrei-Aksionov

Andrei-Aksionov commented 5 months ago

Oh, now I see. Then yes, you need to load the model in 4bit precision. So maybe use the same command as you did above:

checkpoint_file = '/path/fine-tuned.ckpt' base_model_name = "tiiuae/falcon-7b-instruct" model = AutoModelForCausalLM.from_pretrained(base_model_name, load_in_4bit=True, device_map='auto', offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16, state_dict=checkpoint_file)

Only provide state_dict as an actual dict, not a string. The theory is that BitsandBytes will automatically quantize the state dict during loading procedure. But I'm not sure how it is exactly implemented in the transformers library.

Chasapas commented 5 months ago

ok, any idea on this?

You mean that the generated output is identical for base model and merged base model with LoRA weights?

I mean that the response below in my instruction is exactly the same as in the base model. And the instruction is in the dataset i used for the lora tuning. So im not even sure if i tuned anything.

Seed set to 1337
{'eval_interval': 100, 'save_interval': 100, 'eval_iters': 100, 'eval_max_new_tokens': 100, 'log_interval': 1, 'devices': 1, 'learning_rate': 0.0001, 'batch_size': 256, 'micro_batch_size': 3, 'gradient_accumulation_iters': 85, 'max_seq_length': 250, 'max_iters': 100, 'weight_decay': 0.01, 'lora_r': 4, 'lora_alpha': 8, 'lora_dropout': 0.05, 'lora_query': True, 'lora_key': False, 'lora_value': True, 'lora_projection': False, 'lora_mlp': False, 'lora_head': False, 'warmup_steps': 100}
Loading model path/lit-gpt/checkpoints/tiiuae/falcon-7b-instruct/lit_model.pth' with {'name': 'falcon-7b-instruct', 'hf_config': {'org': 'tiiuae', 'name': 'falcon-7b-instruct'}, 'block_size': 2048, 'vocab_size': 65024, 'padding_multiple': 512, 'padded_vocab_size': 65024, 'n_layer': 32, 'n_head': 71, 'n_embd': 4544, 'rotary_percentage': 1.0, 'parallel_residual': True, 'bias': False, 'lm_head_bias': False, 'n_query_groups': 1, 'shared_attention_norm': True, '_norm_class': 'LayerNorm', 'norm_eps': 1e-05, '_mlp_class': 'GptNeoxMLP', 'gelu_approximate': 'none', 'intermediate_size': 18176, 'rope_condense_ratio': 1, 'rope_base': 10000, 'n_expert': 0, 'n_expert_per_token': 0, 'r': 4, 'alpha': 8, 'dropout': 0.05, 'to_query': True, 'to_key': False, 'to_value': True, 'to_projection': False, 'to_mlp': False, 'to_head': False, 'head_size': 64, 'rope_n_elem': 64}
Number of trainable parameters: 1,753,088
Number of non trainable parameters: 7,217,189,760
Seed set to 1337
The longest sequence length in the train data is 183, the model's maximum sequence length is 183 and context length is 2048
Validating ...
What do the default values in the CBAM Regulation signify and how are they applied?
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
**What do the default values in the CBAM Regulation signify and how are they applied?**

### Response:
**The default values in the CBAM Regulation represent the minimum acceptable level of management controls and business processes required to ensure the organization's ability to achieve its objectives and comply with applicable laws and regulations. These default values are applied to all CBAM-controlled entities to guide and ensure implementation of the necessary management controls and business processes.**

iter 1 step 0: loss 3.3856, iter time: 1288.77ms
iter 2 step 0: loss 3.4340, iter time: 1038.95ms
iter 3 step 0: loss 3.5436, iter time: 802.51ms
iter 4 step 0: loss 3.1400, iter time: 762.56ms
iter 5 step 0: loss 3.4491, iter time: 1135.74ms
iter 6 step 0: loss 3.6019, iter time: 772.42ms
iter 7 step 0: loss 3.2783, iter time: 761.54ms
iter 8 step 0: loss 3.6873, iter time: 930.91ms
iter 9 step 0: loss 3.5944, iter time: 745.20ms
iter 10 step 0: loss 3.8152, iter time: 931.94ms
..
iter 100 step 1: loss 3.7119, iter time: 810.90ms
Training time: 96.60s
Memory used: 8.53 GB
Saving LoRA weights to 'path/lit_model_lora_finetuned.pth'
Andrei-Aksionov commented 5 months ago

ok, any idea on this?

No :)

The only possibility for it (LoRA not learning) is if lora_B matrix stays unchanged from what it was initialized with zeros, because in such case lora_A @ lora_B will be a matrix of zeros, thus pretrained + 0 = pretrained.

So I did a quick check with a small model Pythia-410m, 1k iters (other parameters kept unchanged). When I checked state_dict

torch.load("out/lora/alpaca/lit_model_lora_finetuned.pth")

I saw that weights for lora_B weren't zeroed out. So the problem isn't on Lit-GPT side. It's something that is specific to your training. Maybe try to play with learning rate https://github.com/Lightning-AI/lit-gpt/blob/cd1c521d8dddbb07f7fc8b6a45b2a4f74b6892ea/finetune/lora.py#L38

even set it to 1, it might weaken the model, but the difference should much more noticeable.

Chasapas commented 5 months ago

i checked the lora_B matrix before and after the fine tuning and everything is fine.Thank you @Andrei-Aksionov I have another question if you can help me, is there any way to save the base model with the fine tuned checkpoints as a new model?

Andrei-Aksionov commented 5 months ago

As I understand, you are wondering whether it's possible to save the full model after LoRA fine-tuning, not only LoRA weights. (From the first message in this thread, I assume that you're aware of scripts/merge_lora.py existence.) If it's so, you can remove the filter in the save function: https://github.com/Lightning-AI/lit-gpt/blob/ad6554911951768b1db275a9b911e767faaa9aa7/finetune/lora.py#L312-L314

Chasapas commented 5 months ago

Great thanks, @Andrei-Aksionov!! Is there any tutorial on how then to load the lora-tuned model (.pth) with the AutoModelForCausalLM.from_pretrained? I understand that the tokenizer must be loaded from the base model: tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME). If the question is outside the scope of this repository, please ignore it.

I also tried this: checkpoint_file = '/path/fine-tuned.ckpt' base_model_name = "tiiuae/falcon-7b-instruct" model = AutoModelForCausalLM.from_pretrained(base_model_name, load_in_4bit=True, device_map='auto', offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16, state_dict=checkpoint_file) It doesn't raise any errors, but the inference response is the same with the base model response and not the response I got during the Lora tuning phase, which is too weird.

Andrei-Aksionov commented 5 months ago

Is there any tutorial on how then to load the lora-tuned model (.pth) with the AutoModelForCausalLM.from_pretrained?

No, but I think that I might add something like this in the future.

About the second question, I think I've answered it before. It looks like since .from_pretrained expected state_dict to be an actual dict and you provided a string, it could have simply skipped it, without raising any errors, and used original weights instead. That might explain why the response is the same with the base (falcon-7b-instruct) model. So maybe try something like this:

checkpoint_file = "/path/fine-tuned.ckpt"
base_model_name = "tiiuae/falcon-7b-instruct"
state_dict = torch.load(checkpoint_file)
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    load_in_4bit=True,
    device_map="auto",
    offload_folder="offload",
    offload_state_dict=True,
    torch_dtype=torch.float16,
    state_dict=state_dict,
)

Anyway, when I am working on the tutorial I'll check this assumption and provide a proper way to load the weights.

rasbt commented 2 months ago

Just revisiting old issues to prune the issue list and yes, the state_dict approach looks correct, thanks @Andrei-Aksionov . We also have something similar in LitGPT conversion docs and the the new Zero to LitGPT guide.

Closing this issue as addressed, but please feel free to reopen if you have any follow up questions or concerns.