Closed HYZ17 closed 4 months ago
Nice work! When I try to running the training code, I encounter the following error:
File "/ssddata/yuzhen/EVE/eve/model/language_model/eve_llama.py", line 96, in forward clip_loss = self.get_clip_loss()(_input_ids, File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/data/yuzhenh17/miniconda3/envs/eve_envs_2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl result = forward_call(*args, **kwargs) File "/ssddata/yuzhen/EVE/eve/model/multimodal_encoder/vision_tokenizer.py", line 202, in forward i_features = i_features.reshape(L, D, H, W + 1)[:, :, :, :-1] RuntimeError: shape '[33, 4096, 24, 20]' is invalid for input of size 62717952
This error is triggered by this line of code. And I think it is because this line which sets
idx_end
asidx_str + min(N, H * (W + 1) + 1)
. WhenN < H * (W + 1) + 1
,i_features
will not be possible to be reshape as (L, D, H, W + 1).
N is the length of all tokens, while H (W + 1) is the length of the image token.
Only a batch full of text-only data would result in ` N < H (W + 1) + 1, which does not activate the reshape operation (if_exist_image=False). For batches with only image-text data or mixed image-text/text-only data,
N > H * (W + 1) + 1` is always valid. However, the reshaping operation is activated for image-text data (if_exist_image=True).
Thank you for your reply. Is there any possible reason for this kind of error? maybe the dataset like the size of image. I am running it on llava pretraining dataset. https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain
Thank you for your reply. Is there any possible reason for this kind of error? maybe the dataset like the size of image. I am running it on llava pretraining dataset. https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain
I did some ablation studies with the llava pretraining dataset, and didn't meet such a problem.
Note that if you change the stride of patch embedding and downsample layer, you should also change the settings (patch_stride, conv_stride) in openai/eve-patch14-anypixel-1344/preprocessor_config.json
:
"image_size": 1344,
"patch_stride": 14,
"conv_stride": 2,
"image_size_clip": 336,
"patch_stride_clip": 14
Please ensure that image_size // patch_stride // conv_stride
can be divisible by image_size_clip // patch_stride_clip
.
I just found that I mistakenly deleted a line of code, which caused the above error. I am very sorry for any inconvenience and thank you for your assistance. I will close this issue. And again, very nice work.
Nice work! When I try to running the training code, I encounter the following error:
This error is triggered by this line of code. And I think it is because this line which sets
idx_end
asidx_str + min(N, H * (W + 1) + 1)
. WhenN < H * (W + 1) + 1
,i_features
will not be possible to be reshape as (L, D, H, W + 1).