Avoiding runtime error - Githubissues

Avoiding the following runtime error when runnig the code.

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

in which case this error will occur? Do you train img_projection or img_processor?

This error happend for me when training on one GPU (export WORLD_SIZE=1). Implementing this patch fixed the issue for me!

This error happend for me when training on one GPU (export WORLD_SIZE=1). Implementing this patch fixed the issue for me!

Hi, do you set to train img_projection or img_processor?

May you uncomment https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290 in train_phi3v.py to check the status of parameters? In my experiments on multiple GPUs, requires_grad of img_projection and img_processor are all False. In this case, I cannot understand why this error occurs.

Hi, I had the same error when try to finetune with my own jupyter notebook and fixed it in the same way.

in which case this error will occur? Do you train img_projection or img_processor?

Got this error using just train.sh. I don't know exactly why I got this error too.

Hi, @2U1 @ayylemao @shlyahin Can you give me the results of following code (after uncommenting)? https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requires_grad_(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

@ParadoxZW I've just uncommented the codes I fixed, and run your code. Here are the txt files

debug_0.txt debug_1.txt debug_2.txt debug_3.txt

Hi, @2U1 @ayylemao @shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requires_grad_(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Traceback (most recent call last):
  File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in <module>
    train()
  File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train
    trainer.train()
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
    output = super().train(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
    return inner_training_loop(
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
    return self.base_model(
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward
    outputs = self.model(
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward
    inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward
    hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = (
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It gives the same error when running with the suggenestions you gave me.

I’m investigating whether this is caused by vram capacity reasons. Maybe deepspeed’s behavior varies with that.

2024年6月5日 12:57，Yu-won Lee @.***> 写道：

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Traceback (most recent call last): File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in train() File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train trainer.train() File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward outputs = self.model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = ( RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It gives the same error when running with the suggenestions you gave me.

— Reply to this email directly, view it on GitHub https://github.com/GaiZhenbiao/Phi3V-Finetuning/pull/5#issuecomment-2148855748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFM4AJH53755SLCG2HE3SDZF2LERAVCNFSM6AAAAABIXSUG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYHA2TKNZUHA. You are receiving this because your review was requested.

First, may you paste the train.sh you run and the package versions of your pytorch, transformers, accelerator and deepspeed?

Second, what if comment following code https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L275

2024年6月5日 13:43，Tree Diagram @.***> 写道：

I’m investigating whether this is caused by vram capacity reasons. Maybe deepspeed’s behavior varies with that.

2024年6月5日 12:57，Yu-won Lee @.***> 写道：

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)?

https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290

And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case?

Traceback (most recent call last): File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in train() File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train trainer.train() File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward outputs = self.model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = ( RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It gives the same error when running with the suggenestions you gave me.

— Reply to this email directly, view it on GitHub https://github.com/GaiZhenbiao/Phi3V-Finetuning/pull/5#issuecomment-2148855748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFM4AJH53755SLCG2HE3SDZF2LERAVCNFSM6AAAAABIXSUG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYHA2TKNZUHA. You are receiving this because your review was requested.

@GaiZhenbiao This is my train.sh

accelerate launch train_phi3v.py \
    --data_path /home/workspace/description/traffic_158k.json \
    --image_folder /home/workspace/dataset \
    --model_id microsoft/Phi-3-vision-128k-instruct \
    --output_dir output/train_lora_1ep \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --deepspeed_config scripts/zero2.json \
    --num_lora_modules 10 \
    --lora_namespan_exclude "['self_attn', 'lm_head']" \
    --max_seq_length 8192 \
    --quantization \
    --gradient_checkpointing \
    --learning_rate 5e-5 \
    --report_to wandb \
    --logging_dir ft-logs \
    --lora_rank 128 \
    --lora_alpha 256 \
    --lora_dropout 0.05 \
    --logging_steps 1 \
    --dataloader_num_workers 4 | tee logs/$(date +"%Y-%m-%d_%H_%M").log

Also this is my versions

accelerate == 0.29.2 pytorch == 2.2.0 transformers == 4.41.2

First, may you paste the train.sh you run and the package versions of your pytorch, transformers, accelerator and deepspeed? Second, what if comment following code https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L275 … 2024年6月5日 13:43，Tree Diagram @.> 写道： I’m investigating whether this is caused by vram capacity reasons. Maybe deepspeed’s behavior varies with that. > 2024年6月5日 12:57，Yu-won Lee @.> 写道： > > > Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)? > > https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290 > > And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case? > > Hi, @2U1 https://github.com/2U1 @ayylemao https://github.com/ayylemao @shlyahin https://github.com/shlyahin Can you give me the results of following code (after uncommenting)? > > https://github.com/GaiZhenbiao/Phi3V-Finetuning/blob/ad9c3fe4cfd83fecf51f39484a1f29ce0f2cdd95/train_phi3v.py#L288-L290 > > And can you try to add model.requiresgrad(False) right after the model is loaded by from_pretrained? Does the error still occur in this case? > > Traceback (most recent call last): File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 360, in train() File "/home/workspace/Phi3V-Finetuning/train_phi3v.py", line 351, in train trainer.train() File "/opt/conda/envs/travl/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1301, in forward outputs = self.model( File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/workspace/Phi3V-Finetuning/model/modeling_phi3_v.py", line 1128, in forward inputs_embeds = self.vision_embed_tokens(input_ids, pixel_values=pixel_values, image_sizes=image_sizes) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/travl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/workspace/Phi3V-Finetuning/model/image_embedding_phi3_v.py", line 280, in forward hidden_states[positions[idx, 0], positions[idx, 1] : positions[idx, 1] + cnt] = ( RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation. > > It gives the same error when running with the suggenestions you gave me. > > — > Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMFM4AJH53755SLCG2HE3SDZF2LERAVCNFSM6AAAAABIXSUG5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYHA2TKNZUHA. > You are receiving this because your review was requested. >

Actually commenting the requires_grad in the highlited line, solved the problem. But I still don't get it why it happens.

It turns out that I use zero3 so far, so everything works well. But when I change back to zero2, this error occurs. To avoid this error, we disable gradient_checkpointing for zero2 by default. And we recommend you to use zero3 (with gradient_checkpointing) in your experiments.

GaiZhenbiao / Phi3V-Finetuning

Avoiding runtime error #5