error when set batch to 4

Tsachi321 commented 5 years ago

Namespace(augmentation_prob=0.16924156139739033, batch_size=4, beta1=0.5, beta2=0.999, cuda_idx=3, image_size=224, img_ch=1, log_step=2, lr=0.00017304986946574825, mode='train', model_path='./models', model_type='R2U_Net', num_epochs=250, num_epochs_decay=186, num_workers=8, output_ch=1, result_path='./result/R2U_Net', t=3, test_path='/home/Data/DC_disk2/tsachi_dataset/dataset/test/', train_path='/home/Data/DC_disk2/tsachi_dataset/dataset/train/', val_step=2, valid_path='/home/Data/DC_disk2/tsachi_dataset/dataset/valid/') image count in train path :2400 image count in valid path :211 image count in test path :200 Traceback (most recent call last): File "main.py", line 101, in main(config) File "main.py", line 61, in main solver.train() File "/home/imagry/tsachi/solver.py", line 140, in train for i, (images, GT) in enumerate(self.train_loader): File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 336, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 187, in default_collate return [default_collate(samples) for samples in transposed] File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 187, in return [default_collate(samples) for samples in transposed] File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 164, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 192 and 320 in dimension 2 at /pytorch/aten/src/TH/generic/THTensorMath.cpp:3616

Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fca6e95b7b8>> Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 399, in del self._shutdown_workers() File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers self.worker_result_queue.get() File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get return ForkingPickler.loads(res) File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused

LeeJunHyun commented 5 years ago

dear @Tsachi321 , Can you check your data? I think there are wrong data in your path. here is a kind of reference for Sizes of tensors must match except in dimension 0 error.

Tsachi321 commented 5 years ago

I think that my data is fine but I cant set the batch_size to anything more than 1. Any idea how to fix this except of crop the images in the end? @LeeJunHyun

LeeJunHyun commented 5 years ago

Oh, I apologize that I forgot the setting of this experiment.

You should change the code data_loader.py#L82 to Transform.append(T.Resize(224,224))

LeeJunHyun commented 5 years ago

If you have no further questions, I will close the issue. I hope my answer was helpful.

caijinyue commented 5 years ago

Oh, I apologize that I forgot the setting of this experiment.

You should change the code data_loader.py#L82 to Transform.append(T.Resize(224,224))

when i change it to Transform.append(T.Resize((224,224))),i got a wrong training effect

python main.py

Namespace(augmentation_prob=0.41373431961219886, batch_size=5, beta1=0.5, beta2=0.999, cuda_idx=1, image_size=224, img_ch=3, log_step=2, lr=0.00020562507429540996, mode='train', model_path='./models', model_type='U_Net', num_epochs=250, num_epochs_decay=45, num_workers=8, output_ch=1, result_path='./result/U_Net', t=3, test_path='./dataset/test/', train_path='./dataset/train/', val_step=2, valid_path='./dataset/valid/') image count in train path :1200 image count in valid path :400 image count in test path :400 /home/caijinyue/.conda/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name)) /home/caijinyue/.conda/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Epoch [1/250], Loss: 104.8183, [Training] Acc: 0.1732, SE: 0.1005, SP: 0.1944, PC: 0.1622, F1: 0.1189, JS: 0.0883, DC: 0.1189 [Validation] Acc: 0.1807, SE: 0.1502, SP: 0.1886, PC: 0.1468, F1: 0.1451, JS: 0.1163, DC: 0.1451 Best U_Net model score : 0.2614 Epoch [2/250], Loss: 74.9950, [Training] Acc: 0.1792, SE: 0.1280, SP: 0.1955, PC: 0.1716, F1: 0.1414, JS: 0.1122, DC: 0.1414 [Validation] Acc: 0.1864, SE: 0.1393, SP: 0.1975, PC: 0.1839, F1: 0.1560, JS: 0.1306, DC: 0.1560 Best U_Net model score : 0.2866 Epoch [3/250], Loss: 64.4737, [Training] Acc: 0.1808, SE: 0.1363, SP: 0.1953, PC: 0.1721, F1: 0.1470, JS: 0.1188, DC: 0.1470 [Validation] Acc: 0.1868, SE: 0.1712, SP: 0.1915, PC: 0.1597, F1: 0.1629, JS: 0.1392, DC: 0.1629 Best U_Net model score : 0.3021 Epoch [4/250], Loss: 57.3487, [Training] Acc: 0.1823, SE: 0.1425, SP: 0.1954, PC: 0.1730, F1: 0.1514, JS: 0.1249, DC: 0.1514 [Validation] Acc: 0.1885, SE: 0.1576, SP: 0.1961, PC: 0.1764, F1: 0.1642, JS: 0.1413, DC: 0.1642 Best U_Net model score : 0.3055 Epoch [5/250], Loss: 50.9049, [Training] Acc: 0.1841, SE: 0.1506, SP: 0.1954, PC: 0.1746, F1: 0.1576, JS: 0.1327, DC: 0.1576 [Validation] Acc: 0.1891, SE: 0.1590, SP: 0.1967, PC: 0.1816, F1: 0.1677, JS: 0.1460, DC: 0.1677 Best U_Net model score : 0.3137 Epoch [6/250], Loss: 54.0966, [Training] Acc: 0.1830, SE: 0.1466, SP: 0.1955, PC: 0.1735, F1: 0.1543, JS: 0.1285, DC: 0.1543 [Validation] Acc: 0.1853, SE: 0.1263, SP: 0.1994, PC: 0.1952, F1: 0.1508, JS: 0.1243, DC: 0.1508 Epoch [7/250], Loss: 48.4432, [Training] Acc: 0.1846, SE: 0.1539, SP: 0.1955, PC: 0.1745, F1: 0.1595, JS: 0.1351, DC: 0.1595 [Validation] Acc: 0.1883, SE: 0.1548, SP: 0.1964, PC: 0.1785, F1: 0.1626, JS: 0.1404, DC: 0.1626 Epoch [8/250], Loss: 46.3072, [Training] Acc: 0.1853, SE: 0.1548, SP: 0.1954, PC: 0.1745, F1: 0.1603, JS: 0.1366, DC: 0.1603 [Validation] Acc: 0.1834, SE: 0.1665, SP: 0.1877, PC: 0.1516, F1: 0.1556, JS: 0.1311, DC: 0.1556

ykeivn commented 5 years ago

Have you solved this problem? The loss is really declining, so I think the problem is the calculation of evaluation.

LeeJunHyun / Image_Segmentation

error when set batch to 4 #4