Open nzomi opened 4 months ago
Could you try with transformers==4.33.2
?
Could you try with
transformers==4.33.2
?
@YerongLi Yes, I use the same version for all dependencies mentioned in this document.
@YerongLi I have the same issue
@nzomi How did you change the source code to run fine tuning? As for me that's failing before fine tune.
Also I have changed the finetune.py
a bit, as before I had ValueError: The model you want to train is loaded in 8-bit precision. if you want to fine-tune an 8-bit model, please make sure that you have installed bitsandbytes>=0.37.0.
finetune.py
***
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type= "fp4", #"nf4",
bnb_4bit_compute_dtype=torch.float16,
)
# Load model and tokenizer
print(f'Load model from: {model_args.model_name_or_path}')
model = transformers.AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
config=config,
cache_dir=training_args.cache_dir,
device_map=device_map,
trust_remote_code=True,
quantization_config=bnb_config,
# load_in_4bit=True,
)
***
@zhuraromdev I used the demo dataset to fine-tune, and I found an issue in modeling_internlm_xcomposer2.py
wherelen(image) == 2
in the img2emb function, indicating that image is a list. So I modified the code from self.vit([image]...)
to self.vit(image...)
.
Additionally, in build_mlp.py
, I observed that len(img)
is still a list. To resolve this, I added img = img[0]
before calling img.shape
. While I believe this is a hacky solution and may lead to potential issues.
If you load the original, not the 4 bit, does it work?
@YerongLi nope, I didn't try it, as the goal is to fine tune 4bit model. I have updated the code as @nzomi suggested, however still getting an issue:
s to Accelerator is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an accelerate.DataLoaderConfiguration instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
0%| | 0/5 [00:00<?, ?it/s]Set seed 8 for rank 0
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 324, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 314, in train
trainer.train()
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 2679, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 2704, in compute_loss
outputs = model(**inputs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1818, in forward
loss = self.module(*inputs, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/peft/peft_model.py", line 1083, in forward
return self.base_model(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm_xcomposer2.py", line 450, in forward
outputs = self.model(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 956, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 952, in custom_forward
return module(*inputs, output_attentions, None, im_mask, infer_mode)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 663, in forward
hidden_states, self_attn_weights, present_key_value = self.attention(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 467, in forward
qkv_states = self.wqkv(hidden_states, im_mask, infer_mode)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/peft/tuners/lora/bnb.py", line 311, in forward
result = self.base_layer(x, *args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() takes 2 positional arguments but 4 were given
0%| | 0/5 [00:09<?, ?it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 70830) of binary: /home/ubuntu/miniconda3/envs/intern_clean/bin/python
I have tried to fix it as was proposed in this issue: https://github.com/InternLM/InternLM-XComposer/issues/166, however it didn't worked for me
@YerongLi Do you have some suggestions, how can I resolve it?
Hi @zhuraromdev , we have updated the modeling_internlm_xcomposer2.py of the 4-bit model. Can u re-try with the newest version?
@yuhangzang Does this change also support the original model? The new finetuning script neither supports fine-tuning from version 2.0 nor from version 2.5.
@yuhangzang I have tried updated code, however still getting the same issue:
....
orward.w2.weight', 'model.layers.12.attention.wo.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.1.attention.wo.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.12.feed_forward.w3.weight', 'model.layers.15.attention.wo.weight', 'model.layers.4.attention.wqkv.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.12.feed_forward.w2.weight', 'model.layers.2.feed_forward.w1.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 151,003,136 || all params: 8,226,830,336 || trainable%: 1.835495930178862
Loading data...
Load 10 samples from ['data/single_turn_single_image_example.json', '0.01']
init mix data at rank 0
load 10 data
10samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
warnings.warn(
0%| | 0/5 [00:00<?, ?it/s]Set seed 8 for rank 0
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 324, in <module>
train()
File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 314, in train
trainer.train()
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 2679, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 2704, in compute_loss
outputs = model(**inputs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1818, in forward
loss = self.module(*inputs, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/peft/peft_model.py", line 1083, in forward
return self.base_model(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm_xcomposer2.py", line 487, in forward
outputs = self.model(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 956, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 952, in custom_forward
return module(*inputs, output_attentions, None, im_mask, infer_mode)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 663, in forward
hidden_states, self_attn_weights, present_key_value = self.attention(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b-4bit/modeling_internlm2.py", line 467, in forward
qkv_states = self.wqkv(hidden_states, im_mask, infer_mode)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/peft/tuners/lora/bnb.py", line 311, in forward
result = self.base_layer(x, *args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
TypeError: forward() takes 2 positional arguments but 4 were given
0%| | 0/5 [00:00<?, ?it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3114) of binary: /home/ubuntu/miniconda3/envs/intern_clean/bin/python
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/intern_clean/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-08-05_13:08:41
host : ip-172-31-18-91.ec2.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 3114)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
I was running sh finetune_lora.sh
from ~/InternLM-XComposer/finetune
folder. Also I have tried to run fine tuning with updates, suggested by @nzomi and without and for both cases I was getting the same issue
@yuhangzang Does this change also support the original model? The new finetuning script neither supports fine-tuning from version 2.0 nor from version 2.5.
@yuhangzang I am now able to fine-tune the 2d5 model. The issue arose because I downloaded the model from ModelScope, but the script you mentioned is only available on HuggingFace. Could you please update the script on ModelScope as well? Additionally, are there any plans to make the 2d5 finetune.py
and modeling_internlm_xcomposer2.py
scripts compatible with fine-tuning the 2.0 model?
Just meet the same problem when fine-tuning internlm-xcomposer2-4khd-7b using finetune_lora.sh, is any method to fix it?
Dear Developers, I can perform inference using the script you provided, but I encounter an object type mismatch during training. Specifically, I checked the data type, and the image input is already a list, so the images are input as list(list) for ViT. As a result, the image is also a list, causing a type error (no attribute) to occur. If I change the source code, I can fine-tune the model but can no longer perform inference. I simply use the demo dataset
finetune/data/single_turn_single_image_example.json
for finetuning test.
Dear Developers, I can perform inference using the script you provided, but I encounter an object type mismatch during training. Specifically, I checked the data type, and the image input is already a list, so the images are input as list(list) for ViT. As a result, the image is also a list, causing a type error (no attribute) to occur. If I change the source code, I can fine-tune the model but can no longer perform inference. I simply use the demo dataset
finetune/data/single_turn_single_image_example.json
for finetuning test.