ValueError: Expected more than 1 spatial element when training, got input size torch.Size([4, 256, 1, 1, 1])

billhhh / ShaSpec

The official code repository of ShaSpec model from CVPR 2023 [paper](https://arxiv.org/pdf/2307.14126) "Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling"

24 stars 3 forks source link

ValueError: Expected more than 1 spatial element when training, got input size torch.Size([4, 256, 1, 1, 1]) #1

Closed fybgogogo closed 3 months ago

fybgogogo commented 3 months ago

How can I resolve this problem?

billhhh commented 3 months ago

Hello! Thank you for your interested in our work. I think I met this bug before as well and it should be related to the torch version. I personally used pip install --pre torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html. And it works nicely on my 3090 server.

fybgogogo commented 3 months ago

Hello! Thank you for your interested in our work. I think I met this bug before as well and it should be related to the torch version. I personally used pip install --pre torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html. And it works nicely on my 3090 server.

Thank you for your reply. I have installed the required version, but the problem changed into "ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1, 1])". I tried to change the "drop_last=True" in "engine" part, but I can't train the model as well.

fybgogogo commented 3 months ago

My server is 3090ti

billhhh commented 3 months ago

Well, I did not try it on 30090ti. However, it should be related to the torch env setting definitely. It should be caused by the dimention reduction each time of 3d conv. One thing that you can do is to start from an empty env, and do not use "pip install -r requirements.txt", but pip install the required package each time. See if you can solve it.

billhhh commented 3 months ago

I've just uploaded an anaconda env file "environment_shaspec.yml", hopefully it can provide more information.

fybgogogo commented 3 months ago

I try it, but I fail. The problem can't be solved.

billhhh commented 3 months ago

Hi @fybgogogo , I uninstall my torch and reinstall it to reproduce the bug and I think I remember the solution. According to this page https://discuss.pytorch.org/t/error-expected-more-than-1-value-per-channel-when-training/26274, it should be caused by batch = 1. And I used batchsize = 1 to fit a single 3090 Memory.

In your casted bug, you should see something like " File "/home/anaconda3/envs/shaspec/lib/python3.9/site-packages/torch/nn/functional.py", line 2077, in instance_norm _verify_batch_size(input.size())". As in the instanceNorm, it basically check if it is batchsize = 1. So we just need to comment this line in the functional.py file into # _verify_batch_size(input.size()). Your bug will be solved.

I will also update the solution in the Readme, thanks for reporting it.

ver_batch

fybgogogo commented 3 months ago

@billhhh Thank you very much! The model can be trained successfully.

Mannix-D commented 3 weeks ago

Thank you for your reply, but my issue is different from that one.