dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Apache License 2.0
3.22k stars 280 forks source link

运行代码报错AttributeError: 'list' object has no attribute 'to', image_aux_features_raw = self.get_model().get_vision_tower_aux()(images_aux).to(dtype=image_features.dtype, device=image_features.device) #47

Closed shidingz closed 7 months ago

shidingz commented 7 months ago

Traceback (most recent call last): File "/checkpoint/binary/train_package/minigemini/train/train_mem.py", line 14, in train(attn_implementation="flash_attention_2") File "/checkpoint/binary/train_package/minigemini/train/train.py", line 1262, in train trainer.train() File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train return inner_training_loop( File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2902, in training_step loss = self.compute_loss(model, inputs) File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2925, in compute_loss outputs = model(inputs) File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/root/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1833, in forward loss = self.module(*inputs, *kwargs) File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/checkpoint/binary/train_package/minigemini/model/language_model/mini_gemini_gemma.py", line 87, in forward ) = self.prepare_inputs_labels_for_multimodal( File "/checkpoint/binary/train_package/minigemini/model/mini_gemini_arch.py", line 328, in prepare_inputs_labels_for_multimodal image_features = self.encode_images(images, images_aux) File "/checkpoint/binary/train_package/minigemini/model/mini_gemini_arch.py", line 255, in encode_images image_aux_features_raw = self.get_model().get_vision_tower_aux()(images_aux).to( AttributeError: 'list' object has no attribute 'to'

yanwei-li commented 7 months ago

Hi, this error could be attributed to that the input of images_aux is a list. Please check it. If you want to input with image sequence, please modify the implementation.

shidingz commented 7 months ago

I later found out that it was because my batch size (bs) was set to 1. If it is 1, images_aux in the dataset is a list; torch.stack is only called when bs is greater than 1.

yanwei-li commented 7 months ago

Thanks for this report. We fixed this issue in the current version by using torch.stack when batch size is 1.