ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 16, 1, 1])

neverstoplearn commented 6 months ago

2024-04-29 00:26:44,217 - INFO - data_dir : ./QNRF 2024-04-29 00:26:44,218 - INFO - dataset : qnrf 2024-04-29 00:26:44,218 - INFO - arch : FFNet 2024-04-29 00:26:44,218 - INFO - lr : 1e-05 2024-04-29 00:26:44,218 - INFO - eta_min : 1e-05 2024-04-29 00:26:44,218 - INFO - weight_decay : 0 2024-04-29 00:26:44,218 - INFO - resume :
2024-04-29 00:26:44,218 - INFO - max_epoch : 2000 2024-04-29 00:26:44,218 - INFO - val_epoch : 1 2024-04-29 00:26:44,218 - INFO - val_start : 500 2024-04-29 00:26:44,218 - INFO - batch_size : 4 2024-04-29 00:26:44,218 - INFO - device : 0 2024-04-29 00:26:44,218 - INFO - num_workers : 16 2024-04-29 00:26:44,218 - INFO - crop_size : 512 2024-04-29 00:26:44,218 - INFO - wot : 0.1 2024-04-29 00:26:44,218 - INFO - wtv : 0.01 2024-04-29 00:26:44,218 - INFO - reg : 10.0 2024-04-29 00:26:44,218 - INFO - num_of_iter_in_ot: 100 2024-04-29 00:26:44,218 - INFO - norm_cood : 0 2024-04-29 00:26:44,218 - INFO - run_name : FFNet-16-1e-5_1e-5-4_1-21 2024-04-29 00:26:44,218 - INFO - wandb : 0 2024-04-29 00:26:44,218 - INFO - seed : 21 2024-04-29 00:26:45,299 - INFO - using 1 gpus number of img: 1081 number of img: 120 2024-04-29 00:26:48,228 - INFO - random initialization 2024-04-29 00:26:48,229 - INFO - -----Epoch 0/2000----- /home/user/zx/FFNet/losses/bregmanpytorch.py:173: UserWarning: An output with one or more elements was resized since it had shape [4096], which does not match the required output shape [1, 4096]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:28.) torch.matmul(u, K, out=KTu) Traceback (most recent call last): File "/home/user/zx/FFNet/train.py", line 93, in trainer.train() File "/home/user/zx/FFNet/train_helper_FFNet.py", line 181, in train self.train_epoch() File "/home/user/zx/FFNet/train_helper_FFNet.py", line 205, in train_epoch outputs, outputs_normed = self.model(inputs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/user/zx/FFNet/Networks/FFNet.py", line 159, in forward pool1 = self.ccsm1(pool1) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/user/zx/FFNet/Networks/FFNet.py", line 115, in forward x = self.conv1(x) File "/home/user/anaconda3/envs/internLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/user/anaconda3/envs/internLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/user/anaconda3/envs/internLM/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/home/user/anaconda3/envs/internLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/user/zx/FFNet/Networks/ODConv2d.py", line 141, in forward return self._forward_impl(x) File "/home/user/zx/FFNet/Networks/ODConv2d.py", line 119, in _forward_impl_common channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/user/zx/FFNet/Networks/ODConv2d.py", line 81, in forward x = self.bn(x) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward return F.batch_norm( File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/functional.py", line 2476, in batch_norm _verify_batch_size(input.size()) File "/home/user/anaconda3/envs/intern/lib/python3.10/site-packages/torch/nn/functional.py", line 2444, in _verify_batch_size raise ValueError(f"Expected more than 1 value per channel when training, got input size {size}") ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 16, 1, 1])

neverstoplearn commented 6 months ago

when I train ,I got this error,how can i fix it? thanks.

erdongsanshi commented 6 months ago

when I train ,I got this error,how can i fix it? thanks.

@neverstoplearn

Alright, so when you're training your model, you gotta make sure Batch Normalization, or BN for short, has enough data to play with in each bunch, meaning more than one per bunch. You've got 1201 pieces of training data here, and if you're grouping them in batches of 4, the last group ends up being a loner – just one data. That doesn't sit well with BN, it gets confused. The easy way is to deletite one data, buddy. Not perfect, a bit sloppy maybe, but it gets the job done.

erdongsanshi commented 6 months ago

@neverstoplearn

2024-05-04 15:59:02,926 - INFO - data_dir : /datasets/QNRF-Train-Val-Test 2024-05-04 15:59:02,926 - INFO - dataset : qnrf 2024-05-04 15:59:02,926 - INFO - arch : FFNet 2024-05-04 15:59:02,926 - INFO - lr : 1e-05 2024-05-04 15:59:02,926 - INFO - eta_min : 1e-05 2024-05-04 15:59:02,926 - INFO - weight_decay : 0 2024-05-04 15:59:02,926 - INFO - resume : 2024-05-04 15:59:02,926 - INFO - max_epoch : 2000 2024-05-04 15:59:02,926 - INFO - val_epoch : 1 2024-05-04 15:59:02,926 - INFO - val_start : 500 2024-05-04 15:59:02,926 - INFO - batch_size : 4 2024-05-04 15:59:02,926 - INFO - device : 0 2024-05-04 15:59:02,926 - INFO - num_workers : 16 2024-05-04 15:59:02,926 - INFO - crop_size : 512 2024-05-04 15:59:02,926 - INFO - wot : 0.1 2024-05-04 15:59:02,926 - INFO - wtv : 0.01 2024-05-04 15:59:02,926 - INFO - reg : 10.0 2024-05-04 15:59:02,926 - INFO - num_of_iter_in_ot: 100 2024-05-04 15:59:02,926 - INFO - norm_cood : 0 2024-05-04 15:59:02,926 - INFO - run_name : FFNet-16-1e-5_1e-5-4_1-21 2024-05-04 15:59:02,926 - INFO - wandb : 0 2024-05-04 15:59:02,926 - INFO - seed : 21 2024-05-04 15:59:02,976 - INFO - using 1 gpus number of img: 1080 number of img: 120 2024-05-04 15:59:04,014 - INFO - random initialization 2024-05-04 15:59:04,014 - INFO - -----Epoch 0/2000----- /home/deeplearn/JupyterlabRoot/erdongsanshi/FFNet/losses/bregmanpytorch.py:173: UserWarning: An output with one or more elements was resized since it had shape [4096], which does not match the required output shape [1, 4096]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:17.) torch.matmul(u, K, out=KTu) 2024-05-04 15:59:46,367 - INFO - Epoch 0 Train, Loss: 142.83, OT Loss: -1.48e-07, Wass Distance: 234.76, OT obj value: 67.98, Count Loss: 140.64, TV Loss: 2.19, MSE: 239.89 MAE: 140.64, Cost 42.4 sec

erdongsanshi commented 6 months ago

@neverstoplearn I analyzed the structure of my model, and I have dynamic convolution in the neck of the model, which requires every batch of data to be greater than 1, which should also be relevant. But the root cause is that the last batch only has one data. You can solve the problem in this way

neverstoplearn commented 6 months ago

thanks,I sovle it.

erdongsanshi / FFNet

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 16, 1, 1]) #1