YoungXIAO13 / FewShotDetection

(ECCV 2020) PyTorch implementation of paper "Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild"
http://imagine.enpc.fr/~xiaoy/FSDetView/
MIT License
210 stars 33 forks source link

TypeError: 'NoneType' object is not iterable #11

Open twinkleShen opened 3 years ago

twinkleShen commented 3 years ago

@YoungXIAO13 ,hello,when I use 'bash run/train_voc_first.sh' to try to run 'train.py', I have the following problems: When training to a certain round, will suddenly prompt this error. I don't know how to solve it. Can you help me to have a look?

.........
[session 1][epoch  4][iter 1100] loss: 0.4251, lr: 1.00e-03
                        fg/bg=(117/395), time cost: 82.185346
                        rpn_cls: 0.0298, rpn_box: 0.0210, rcnn_cls: 0.1314, rcnn_box 0.1463, meta_loss 0.0307
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCTensorCopy.c line=21 error=4 : unspecified launch failure
Traceback (most recent call last):
  File "train.py", line 446, in <module>
    num_boxes_list)
  File "/home/zhangwei/anaconda3/envs/sfsmtl35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/hdd/shenfanshu/DAFSdetection/FewShotDetection-master/lib/model/faster_rcnn/faster_rcnn.py", line 77, in forward
    rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)
  File "/home/zhangwei/anaconda3/envs/sfsmtl35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/hdd/shenfanshu/DAFSdetection/FewShotDetection-master/lib/model/rpn/rpn.py", line 78, in forward
    im_info, cfg_key))
  File "/home/zhangwei/anaconda3/envs/sfsmtl35/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/hdd/shenfanshu/DAFSdetection/FewShotDetection-master/lib/model/rpn/proposal_layer.py", line 85, in forward
    shifts = shifts.contiguous().type_as(scores).float()
RuntimeError: cuda runtime error (4) : unspecified launch failure at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCTensorCopy.c:21
Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7ffb6c0d8208>>
Traceback (most recent call last):
  File "/home/zhangwei/anaconda3/envs/sfsmtl35/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
  File "/home/zhangwei/anaconda3/envs/sfsmtl35/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
  File "/home/zhangwei/anaconda3/envs/sfsmtl35/lib/python3.5/multiprocessing/queues.py", line 337, in get
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 887, in _find_spec
TypeError: 'NoneType' object is not iterable
YoungXIAO13 commented 3 years ago

Hi @twinkleShen ,

From my perspective, the error is RuntimeError: cuda runtime error (4) : unspecified launch failure at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/generic/THCTensorCopy.c:21

You could refer to this issue that discusses the similar problem as you encountered. A possible solution is to resume the training from its last checkpoint and make sure there is only one process running on the GPU.