Open heart-and-soul opened 11 months ago
Hi, For prefix, the performance is not good as other adapters based on our experiment results. And you can use the following command for fine-tuning may improve the performance.
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-13b-hf' --data_path 'math_10k.json' --output_dir './trained_models/llama-13b-prefix-math-vt10/' --batch_size 8 --micro_batch_size 4 --num_epochs 5 --learning_rate 3e-2 --cutoff_len 256 --val_set_size 120 --eval_step 10 --save_step 10 --adapter_name prefix-tuning --num_virtual_tokens 10 --load_8bit --use_gradient_checkpointing
Hi, For prefix, the performance is not good as other adapters based on our experiment results. And you can use the following command for fine-tuning may improve the performance.
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-13b-hf' --data_path 'math_10k.json' --output_dir './trained_models/llama-13b-prefix-math-vt10/' --batch_size 8 --micro_batch_size 4 --num_epochs 5 --learning_rate 3e-2 --cutoff_len 256 --val_set_size 120 --eval_step 10 --save_step 10 --adapter_name prefix-tuning --num_virtual_tokens 10 --load_8bit --use_gradient_checkpointing
Hi, I just used the above line for prefix tuning (only changed 'yahma/llama-13b-hf' to 'yahma/llama-7b-hf', and removed "--load_8bit"), but got the following error, may I know how to resolve it?
Traceback (most recent call last):
File "/home/xxx/repo/llm/LLM-Adapters/finetune.py", line 347, in <module>
fire.Fire(train)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/xxx/repo/llm/LLM-Adapters/finetune.py", line 314, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 1542, in train
return inner_training_loop(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 1872, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 2773, in training_step
loss = self.compute_loss(model, inputs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 2796, in compute_loss
outputs = model(**inputs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/operations.py", line 687, in forward
return model_forward(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/operations.py", line 675, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/xxx/repo/llm/LLM-Adapters/peft/src/peft/peft_model.py", line 568, in forward
return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1035, in forward
attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa
expanded_4d_mask = attn_mask_converter.to_4d(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (266) must match the size of tensor b (256) at non-singleton dimension 3
@muliyangm Take a look at the discussion here. This might fix it.
I use evaluate.py to evaluate the prefix fine-tuned llama, model output is very strange, such as ". 5. 1 and 5 of the 3 and 5. 5, 10000000000000000000000000000000000000000. " How can I solve this problem?