Open hengjiUSTC opened 6 months ago
However I am able to run lora with fp16 in my other experiments https://github.com/hengjiUSTC/learn-llm/blob/main/trl_finetune.py#L316. So I am not sure what is the expected behavior?
I found the bug happens when I set
lora_modules_to_save:
- embed_tokens
- lm_head
Why I set it?
The reason I set it is because detection in https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/models.py#L153
However given my special token is just setting pad_token to <unk>
which is already in token list. I feel this detection shouldn't be triggered?
Not sure why setting lora_modules_to_save
with fp16 leads to crash.
Another problem at https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/models.py#L123. When I have flash_attention
to false and is_mistral_derived_model
to true, it will not set mixtral padding to left. Which is incorrect for Mixtral training
I'm wondering if we are even supposed to be recasting to fp16. the original qlora only recasts when bf16 is used https://github.com/artidoro/qlora/blame/main/qlora.py#L396-L405
@hengjiUSTC if you comment out these lines for your configuration above, does that fix the issue?
I am using Lora instead of Qlora, these lines won't be triggered https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/models.py#L554-L561/
if (cfg.adapter == "lora" and load_in_8bit) or (
cfg.adapter == "qlora" and cfg.load_in_4bit
):
load_in_8bit is false and load_in_4bit is also false
See relevant discussion in : https://github.com/huggingface/transformers/issues/23165 https://github.com/huggingface/peft/issues/341
Here are some experiements:
Break with raise ValueError("Attempting to unscale FP16 gradients.")
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float16,
)
training_args = TrainingArguments(
fp16=True,
...
)
No error for below two configs
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float32,
)
training_args = TrainingArguments(
fp16=True,
...
)
model = AutoModelForCausalLM.from_pretrained(
...
torch_dtype=torch.float16,
)
training_args = TrainingArguments(
fp16=False,
...
)
I am a bit new to these settings, does anyone know what is the reason? (I am using T4 gpu, so not able to use bf16) How should we handle this error in axololt?
I get confirmation that we should not load model in float16 when enable fp16 in peft config. https://github.com/huggingface/peft/issues/341#issuecomment-1884911753. But I do see a lot of code (other finetune repo) doing this. And it's the reason error is raised in Axolotl (when fp16 is ture in config.yml, model is loaded with float16 and fp16 is enabled in peft).
I also have these lines because I am using ChatML and adding new tokens to the base model
lora_modules_to_save:
- embed_tokens
- lm_head
Based on what @hengjiUSTC linked, if I understand it correctly, fp16 adapter training must use fp32 for trainable and fp16 for non-trainable. They provided a utility function cast_mixed_precision_params(peft_model, dtype)
for us to use, but since we also handle gate/norm, we may need to adjust ourselves.
Please check that this issue hasn't been reported before.
Expected Behavior
Should run correctly.
Current behaviour
running crash
Steps to reproduce
I use following config:
and run with
python3 -m axolotl.cli.train mix_tangshi/config.yml
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main commit 3678a6c41d051ca6376d013c11c948e55b4c8b4f
Acknowledgements