GaiZhenbiao / Phi3V-Finetuning

Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
Apache License 2.0
43 stars 11 forks source link

Avoiding runtime error #5

Closed 2U1 closed 2 months ago

2U1 commented 2 months ago

Avoiding the following runtime error when runnig the code.

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

ParadoxZW commented 2 months ago

in which case this error will occur? Do you train img_projection or img_processor?

ayylemao commented 2 months ago

This error happend for me when training on one GPU (export WORLD_SIZE=1). Implementing this patch fixed the issue for me!

ParadoxZW commented 2 months ago

This error happend for me when training on one GPU (export WORLD_SIZE=1). Implementing this patch fixed the issue for me!

Hi, do you set to train img_projection or img_processor?

May you uncomment https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290 in train_phi3v.py to check the status of parameters? In my experiments on multiple GPUs, requires_grad of img_projection and img_processor are all False. In this case, I cannot understand why this error occurs.

shlyahin commented 2 months ago

Hi, I had the same error when try to finetune with my own jupyter notebook and fixed it in the same way.

2U1 commented 2 months ago

in which case this error will occur? Do you train img_projection or img_processor?

Got this error using just train.sh. I don't know exactly why I got this error too.

ParadoxZW commented 2 months ago

Hi, @2U1 @ayylemao @shlyahin Can you give me the results of following code (after uncommenting)? https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requires_grad_(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

2U1 commented 2 months ago

@ParadoxZW I've just uncommented the codes I fixed, and run your code. Here are the txt files

debug_0.txt debug_1.txt debug_2.txt debug_3.txt

2U1 commented 2 months ago

Hi, @2U1 @ayylemao @shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requires_grad_(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Traceback (most recent call last):
  File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in <module>
    train()
  File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train
    trainer.train()
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
    output = super().train(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
    return inner_training_loop(
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
    return self.base_model(
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward
    outputs = self.model(
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward
    inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward
    hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = (
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It gives the same error when running with the suggenestions you gave me.

GaiZhenbiao commented 2 months ago

I’m investigating whether this is caused by vram capacity reasons. Maybe deepspeed’s behavior varies with that.

2024年6月5日 12:57,Yu-won Lee @.***> 写道:

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Traceback (most recent call last): File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in train() File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train trainer.train() File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward outputs = self.model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = ( RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It gives the same error when running with the suggenestions you gave me.

— Reply to this email directly, view it on GitHub https://github.com/GaiZhenbiao/Phi3V-Finetuning/pull/5#issuecomment-2148855748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFM4AJH53755SLCG2HE3SDZF2LERAVCNFSM6AAAAABIXSUG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYHA2TKNZUHA. You are receiving this because your review was requested.

GaiZhenbiao commented 2 months ago

First, may you paste the train.sh you run and the package versions of your pytorch, transformers, accelerator and deepspeed?

Second, what if comment following code https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L275

2024年6月5日 13:43,Tree Diagram @.***> 写道:

I’m investigating whether this is caused by vram capacity reasons. Maybe deepspeed’s behavior varies with that.

2024年6月5日 12:57,Yu-won Lee @.***> 写道:

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Traceback (most recent call last): File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in train() File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train trainer.train() File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward outputs = self.model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = ( RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It gives the same error when running with the suggenestions you gave me.

— Reply to this email directly, view it on GitHub https://github.com/GaiZhenbiao/Phi3V-Finetuning/pull/5#issuecomment-2148855748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFM4AJH53755SLCG2HE3SDZF2LERAVCNFSM6AAAAABIXSUG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYHA2TKNZUHA. You are receiving this because your review was requested.

2U1 commented 2 months ago

@GaiZhenbiao This is my train.sh

accelerate launch train_phi3v.py \
    --data_path /home/workspace/description/traffic_158k.json \
    --image_folder /home/workspace/dataset \
    --model_id microsoft/Phi-3-vision-128k-instruct \
    --output_dir output/train_lora_1ep \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --deepspeed_config scripts/zero2.json \
    --num_lora_modules 10 \
    --lora_namespan_exclude "['self_attn', 'lm_head']" \
    --max_seq_length 8192 \
    --quantization \
    --gradient_checkpointing \
    --learning_rate 5e-5 \
    --report_to wandb \
    --logging_dir ft-logs \
    --lora_rank 128 \
    --lora_alpha 256 \
    --lora_dropout 0.05 \
    --logging_steps 1 \
    --dataloader_num_workers 4 | tee logs/$(date +"%Y-%m-%d_%H_%M").log

Also this is my versions

accelerate == 0.29.2 pytorch == 2.2.0 transformers == 4.41.2

2U1 commented 2 months ago

First, may you paste the train.sh you run and the package versions of your pytorch, transformers, accelerator and deepspeed? Second, what if comment following code https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L275 2024年6月5日 13:43,Tree Diagram @.> 写道: I’m investigating whether this is caused by vram capacity reasons. Maybe deepspeed’s behavior varies with that. > 2024年6月5日 12:57,Yu-won Lee @.> 写道: > > > Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)? > > https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290 > > And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case? > > Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)? > > https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290 > > And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case? > > Traceback (most recent call last): File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in train() File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train trainer.train() File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward outputs = self.model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = ( RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation. > > It gives the same error when running with the suggenestions you gave me. > > — > Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFM4AJH53755SLCG2HE3SDZF2LERAVCNFSM6AAAAABIXSUG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYHA2TKNZUHA. > You are receiving this because your review was requested. >

Actually commenting the requires_grad in the highlited line, solved the problem. But I still don't get it why it happens.

ParadoxZW commented 2 months ago

It turns out that I use zero3 so far, so everything works well. But when I change back to zero2, this error occurs. To avoid this error, we disable gradient_checkpointing for zero2 by default. And we recommend you to use zero3 (with gradient_checkpointing) in your experiments.