Open bahetibhakti opened 5 years ago
same issue
this is bug???
same issue,could you solve it?
Because the crop_size argument is disabled while testing. The argument is enabled while only training. Please refer https://github.com/fyu/drn/blob/d75db2ee7070426db7a9264ee61cf489f8cf178c/segment.py#L632-L640 and https://github.com/fyu/drn/blob/d75db2ee7070426db7a9264ee61cf489f8cf178c/segment.py#L360-L383
It's because the new pytorch deprecated volatile
, which was used to disable gradient recording. The new recommended way is using torch.no_grad()
.
In the last line of segment.py
, wrap main()
with with torch.no_grad():
if __name__ == "__main__":
with torch.no_grad():
main()
Is anyone able to test the code for "DRN-D-105" architecture on test data?? I am able to train and validate but while testing error occurs as "RuntimeError: CUDA error: out of memory" even with small crop size = 256*256 and batchsize =1. I checked resources while testing and resources are free enough (both GPU memory and system RAM) I am using NVIDIA P100 GPU with 16 GB memory.
Any thought?
(bhakti) user@user:/mnt/komal/bhakti/anue$ python3 segment.py test -d dataset/ -c 26 --arch drn_d_105 --resume model_best.pth.tar --phase test --batch-size 1 -j2 segment.py test -d dataset/ -c 26 --arch drn_d_105 --resume model_best.pth.tar --phase test --batch-size 1 -j2 Namespace(arch='drn_d_105', batch_size=1, bn_sync=False, classes=26, cmd='test', crop_size=896, data_dir='dataset/', epochs=10, evaluate=False, list_dir=None, load_rel=None, lr=0.01, lr_mode='step', momentum=0.9, ms=False, phase='test', pretrained='', random_rotate=0, random_scale=0, resume='model_best.pth.tar', step=200, test_suffix='', weight_decay=0.0001, with_gt=False, workers=2) classes : 26 batch_size : 1 pretrained : momentum : 0.9 with_gt : False phase : test list_dir : None lr_mode : step weight_decay : 0.0001 epochs : 10 step : 200 bn_sync : False ms : False arch : drn_d_105 random_rotate : 0 random_scale : 0 workers : 2 crop_size : 896 lr : 0.01 load_rel : None resume : model_best.pth.tar evaluate : False cmd : test data_dir : dataset/ test_suffix : [2019-09-14 19:14:23,173 segment.py:697 test_seg] => loading checkpoint 'model_best.pth.tar' [2019-09-14 19:14:23,509 segment.py:703 test_seg] => loaded checkpoint 'model_best.pth.tar' (epoch 1) segment.py:540: UserWarning: volatile was removed and now has no effect. Use
main()
File "segment.py", line 785, in main
test_seg(args)
File "segment.py", line 720, in test_seg
has_gt=phase != 'test' or args.with_gt, output_dir=out_dir)
File "segment.py", line 544, in test
final = model(image_var)[0]
File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, kwargs)
File "segment.py", line 142, in forward
y = self.up(x)
File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 691, in forward
output_padding, self.groups, self.dilation)
RuntimeError: CUDA error: out of memory
with torch.no_grad():
instead. image_var = Variable(image, requires_grad=False, volatile=True) Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f15eff61160>> Traceback (most recent call last): File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 399, in del self._shutdown_workers() File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers self.worker_result_queue.get() File "/home/user/anaconda2/envs/bhakti/lib/python3.5/multiprocessing/queues.py", line 337, in get return ForkingPickler.loads(res) File "/home/user/anaconda2/envs/bhakti/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/home/user/anaconda2/envs/bhakti/lib/python3.5/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/user/anaconda2/envs/bhakti/lib/python3.5/multiprocessing/reduction.py", line 181, in recv_handle return recvfds(s, 1)[0] File "/home/user/anaconda2/envs/bhakti/lib/python3.5/multiprocessing/reduction.py", line 152, in recvfds msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_LEN(bytes_size)) ConnectionResetError: [Errno 104] Connection reset by peer Traceback (most recent call last): File "segment.py", line 789, in