Closed fybgogogo closed 3 months ago
Hello! Thank you for your interested in our work. I think I met this bug before as well and it should be related to the torch version. I personally used pip install --pre torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
. And it works nicely on my 3090 server.
Hello! Thank you for your interested in our work. I think I met this bug before as well and it should be related to the torch version. I personally used
pip install --pre torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
. And it works nicely on my 3090 server.
Thank you for your reply. I have installed the required version, but the problem changed into "ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1, 1])". I tried to change the "drop_last=True" in "engine" part, but I can't train the model as well.
My server is 3090ti
Well, I did not try it on 30090ti. However, it should be related to the torch env setting definitely. It should be caused by the dimention reduction each time of 3d conv. One thing that you can do is to start from an empty env, and do not use "pip install -r requirements.txt", but pip install the required package each time. See if you can solve it.
I've just uploaded an anaconda env file "environment_shaspec.yml", hopefully it can provide more information.
I try it, but I fail. The problem can't be solved.
Hi @fybgogogo , I uninstall my torch and reinstall it to reproduce the bug and I think I remember the solution. According to this page https://discuss.pytorch.org/t/error-expected-more-than-1-value-per-channel-when-training/26274, it should be caused by batch = 1. And I used batchsize = 1 to fit a single 3090 Memory.
In your casted bug, you should see something like " File "/home/anaconda3/envs/shaspec/lib/python3.9/site-packages/torch/nn/functional.py", line 2077, in instance_norm
_verify_batch_size(input.size())". As in the instanceNorm, it basically check if it is batchsize = 1. So we just need to comment this line in the functional.py file into # _verify_batch_size(input.size())
. Your bug will be solved.
I will also update the solution in the Readme, thanks for reporting it.
@billhhh Thank you very much! The model can be trained successfully.
Thank you for your reply, but my issue is different from that one.
How can I resolve this problem?