MaverickRen / PixelLM

PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.
Apache License 2.0
178 stars 5 forks source link

An error was encountered during training #20

Open clevercaicai opened 2 months ago

clevercaicai commented 2 months ago

Hello, thank you very much for your contribution. An error was encountered during training Traceback (most recent call last): File "/home/zhangc/PixelLM/train_ds.py", line 974, in <module> main(sys.argv[1:]) File "/home/zhangc/PixelLM/train_ds.py", line 522, in main train_iter = train( File "/home/zhangc/PixelLM/train_ds.py", line 618, in train output_dict = model(**input_dict) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1829, in forward loss = self.module(*inputs, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/peft/peft_model.py", line 918, in forward return self.base_model( File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward return self.model.forward(*args, **kwargs) File "/home/zhangc/PixelLM/model/PixelLM.py", line 324, in forward return self.model_forward(**kwargs) File "/home/zhangc/PixelLM/model/PixelLM.py", line 420, in model_forward output = super().forward( # 将所有扩展后的图像特征在第一个维度上拼接,形成完整的图像特征序列。 File "/home/zhangc/PixelLM/model/llava/model/language_model/llava_llama.py", line 98, in forward ) = self.prepare_inputs_labels_for_multimodal( File "/home/zhangc/PixelLM/model/llava/model/llava_arch.py", line 158, in prepare_inputs_labels_for_multimodal image_features, vit_attention_mask, pre_image_features = self.encode_images(images, clip_resize_list) File "/home/zhangc/PixelLM/model/llava/model/llava_arch.py", line 110, in encode_images vit_attention_mask = F.interpolate(vit_attention_mask[:, None], size=(patch_num, patch_num), mode="nearest")[:, 0] File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/functional.py", line 3922, in interpolate return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) RuntimeError: "upsample_nearest2d_out_frame" not implemented for 'BFloat16' I continued training after resolving this issue with the following settings vit_attention_mask = vit_attention_mask.to(torch.float32) But another error occurred Traceback (most recent call last): File "/home/zhangc/PixelLM/train_ds.py", line 974, in <module> main(sys.argv[1:]) File "/home/zhangc/PixelLM/train_ds.py", line 522, in main train_iter = train( File "/home/zhangc/PixelLM/train_ds.py", line 618, in train output_dict = model(**input_dict) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1829, in forward loss = self.module(*inputs, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/peft/peft_model.py", line 918, in forward return self.base_model( File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/zhangc/miniconda3/envs/pixel/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward return self.model.forward(*args, **kwargs) File "/home/zhangc/PixelLM/model/PixelLM.py", line 324, in forward return self.model_forward(**kwargs) File "/home/zhangc/PixelLM/model/PixelLM.py", line 592, in model_forward gt_mask.shape[0] == pred_mask.shape[0] AssertionError: gt_mask.shape: torch.Size([17, 480, 640]), pred_mask.shape: torch.Size([0, 480, 640])

zhixuanli commented 1 month ago

I also encountered this bug. Could you please give some suggestions? @MaverickRen