2019-06-12 06:42:21,014 INFO [module_helper.py, 138] Loading pretrained model:/tmp/cars_segmentation/torchcv/pretrained_models/3x3resnet101-imagenet.pth
2019-06-12 06:42:28,858 INFO [controller.py, 28] Training start...
Traceback (most recent call last):
File "main.py", line 199, in
Controller.train(runner)
File "/tmp/cars_segmentation/torchcv/methods/tools/controller.py", line 40, in train
runner.train()
File "/tmp/cars_segmentation/torchcv/methods/seg/fcn_segmentor.py", line 85, in train
out_dict = self.seg_net(data_dict)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, *kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/tmp/cars_segmentation/torchcv/models/seg/nets/pspnet.py", line 84, in forward
x = self.backbone(data_dict['img'])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/tmp/cars_segmentation/torchcv/models/backbones/resnet/resnet_backbone.py", line 94, in forward
x = self.prefix(x)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/syncbn.py", line 44, in forward
xsum, xsqsum = sum_square(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 19, in sum_square
return _sum_square.apply(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 27, in forward
xsum, xsqusum = gpu.sumsquare_forward(input)
RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at syncbn_kernel.cu:263, please report a bug to PyTorch. (Sum_Square_Forward_CUDA at syncbn_kernel.cu:263)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6cd6bd1441 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6cd6bd0d7a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: Sum_Square_Forward_CUDA(at::Tensor) + 0x281 (0x7f6cbdb615e2 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x1fc29 (0x7f6cbdb57c29 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #4: + 0x24095 (0x7f6cbdb5c095 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
when I train, I get this error. Can you help me solve it, thanks
2019-06-12 06:42:21,014 INFO [module_helper.py, 138] Loading pretrained model:/tmp/cars_segmentation/torchcv/pretrained_models/3x3resnet101-imagenet.pth 2019-06-12 06:42:28,858 INFO [controller.py, 28] Training start... Traceback (most recent call last): File "main.py", line 199, in
Controller.train(runner)
File "/tmp/cars_segmentation/torchcv/methods/tools/controller.py", line 40, in train
runner.train()
File "/tmp/cars_segmentation/torchcv/methods/seg/fcn_segmentor.py", line 85, in train
out_dict = self.seg_net(data_dict)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, *kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/tmp/cars_segmentation/torchcv/models/seg/nets/pspnet.py", line 84, in forward
x = self.backbone(data_dict['img'])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, kwargs)
File "/tmp/cars_segmentation/torchcv/models/backbones/resnet/resnet_backbone.py", line 94, in forward
x = self.prefix(x)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, kwargs)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/syncbn.py", line 44, in forward
xsum, xsqsum = sum_square(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 19, in sum_square
return _sum_square.apply(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 27, in forward
xsum, xsqusum = gpu.sumsquare_forward(input)
RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at syncbn_kernel.cu:263, please report a bug to PyTorch. (Sum_Square_Forward_CUDA at syncbn_kernel.cu:263) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6cd6bd1441 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6cd6bd0d7a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: Sum_Square_Forward_CUDA(at::Tensor) + 0x281 (0x7f6cbdb615e2 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so) frame #3: + 0x1fc29 (0x7f6cbdb57c29 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #4: + 0x24095 (0x7f6cbdb5c095 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
when I train, I get this error. Can you help me solve it, thanks