baaivision / EVE

[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
MIT License
238 stars 3 forks source link

RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952 #8

Closed HYZ17 closed 4 months ago

HYZ17 commented 4 months ago

Nice work! When I try to running the training code, I encounter the following error:

File "/ssddata/yuzhen/EVE/eve/model/language_model/eve_llama.py", line 96, in forward
    clip_loss = self.get_clip_loss()(_input_ids,
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/ssddata/yuzhen/EVE/eve/model/multimodal_encoder/vision_tokenizer.py", line 202, in forward
    i_features = i_features.reshape(L, D, H, W + 1)[:, :, :, :-1]
RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

This error is triggered by this line of code. And I think it is because this line which sets idx_end as idx_str + min(N, H * (W + 1) + 1). When N < H * (W + 1) + 1, i_features will not be possible to be reshape as (L, D, H, W + 1).

Paranioar commented 4 months ago

Nice work! When I try to running the training code, I encounter the following error:

File "/ssddata/yuzhen/EVE/eve/model/language_model/eve_llama.py", line 96, in forward
    clip_loss = self.get_clip_loss()(_input_ids,
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/ssddata/yuzhen/EVE/eve/model/multimodal_encoder/vision_tokenizer.py", line 202, in forward
    i_features = i_features.reshape(L, D, H, W + 1)[:, :, :, :-1]
RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952

This error is triggered by this line of code. And I think it is because this line which sets idx_end as idx_str + min(N, H * (W + 1) + 1). When N < H * (W + 1) + 1, i_features will not be possible to be reshape as (L, D, H, W + 1).

N is the length of all tokens, while H (W + 1) is the length of the image token. Only a batch full of text-only data would result in ` N < H (W + 1) + 1, which does not activate the reshape operation (if_exist_image=False). For batches with only image-text data or mixed image-text/text-only data, N > H * (W + 1) + 1` is always valid. However, the reshaping operation is activated for image-text data (if_exist_image=True).

HYZ17 commented 4 months ago

Thank you for your reply. Is there any possible reason for this kind of error? maybe the dataset like the size of image. I am running it on llava pretraining dataset. https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain

Paranioar commented 4 months ago

Thank you for your reply. Is there any possible reason for this kind of error? maybe the dataset like the size of image. I am running it on llava pretraining dataset. https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain

I did some ablation studies with the llava pretraining dataset, and didn't meet such a problem.

Note that if you change the stride of patch embedding and downsample layer, you should also change the settings (patch_stride, conv_stride) in openai/eve-patch14-anypixel-1344/preprocessor_config.json:

  "image_size": 1344,
  "patch_stride": 14,
  "conv_stride": 2,
  "image_size_clip": 336,
  "patch_stride_clip": 14

Please ensure that image_size // patch_stride // conv_stride can be divisible by image_size_clip // patch_stride_clip.

HYZ17 commented 4 months ago

I just found that I mistakenly deleted a line of code, which caused the above error. I am very sorry for any inconvenience and thank you for your assistance. I will close this issue. And again, very nice work.