I tried to run the code on Pascal VOC 2007 only, with resnet-50, and it worked.
Then I tried to run the code on Pascal VOC 2007+2012, with resnet-101, this bug appeared.
The environment I used is
CUDA 9.0
Python=3.7
PyTorch=0.4.1.post2
torchvision=0.2.1.post2
in colab
I tried several methods, such as
delete -1 in psacal_voc.py
# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) #- 1
y1 = float(bbox.find('ymin').text) #- 1
x2 = float(bbox.find('xmax').text) #- 1
y2 = float(bbox.find('ymax').text) #- 1
Called with args:
Namespace(TFA=False, batch_size=4, checkepoch=10, checkpoint=21985, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='pascal_voc_0712', disp_interval=100, fix_encoder=False, log_dir='checkpoint', lr=0.001, lr_decay_gamma=0.1, lr_decay_step=4, max_epochs=21, meta_loss=True, meta_train=True, meta_type=1, net='metarcnn', num_workers=1, optimizer='sgd', phase=1, resume=False, save_dir='save_models/VOC_first', session=1, shots=1, start_epoch=1, use_tfboard=True)
Loaded dataset voc_2007_train_first_split for training
Set proposal method: gt
Appending horizontally-flipped training examples...
wrote gt roidb to /content/drive/MyDrive/few shot/data/cache/voc_2007_train_first_split_gt_roidb.pkl
done
Preparing training data...
done
Loaded dataset voc_2012_train_first_split for training
Set proposal method: gt
Appending horizontally-flipped training examples...
wrote gt roidb to /content/drive/MyDrive/few shot/data/cache/voc_2012_train_first_split_gt_roidb.pkl
done
Preparing training data...
done
before filtering, there are 12330 images...
after filtering, there are 12330 images...
before class filtering, there are 12330 images...
after class filtering, there are 12330 images...
12330 roidb entries
Loading pretrained weights from data/resnet101.pth
[session 1][epoch 1][iter 0] loss: 17.7643, lr: 1.00e-03
fg/bg=(38/474), time cost: 1.222608
rpn_cls: 0.8498, rpn_box: 0.4404, rcnn_cls: 15.8508, rcnn_box 0.4173, meta_loss 0.2060
Traceback (most recent call last):
File "train.py", line 477, in
rois_label, cls_prob, bbox_pred, meta_loss = fasterRCNN(im_data_list, im_info_list, gt_boxes_list, num_boxes_list)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/content/drive/MyDrive/few shot/lib/model/faster_rcnn/faster_rcnn.py", line 84, in forward
roi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, **kwargs)
File "/content/drive/MyDrive/few shot/lib/model/rpn/proposal_target_layer_cascade.py", line 47, in forward
rois_per_image, self._num_classes)
File "/content/drive/MyDrive/few shot/lib/model/rpn/proposal_target_layer_cascade.py", line 202, in _sample_rois_pytorch
raise ValueError("bg_num_rois = 0 and fg_num_rois = 0, this should not happen!")
ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!
I tried to run the code on Pascal VOC 2007 only, with resnet-50, and it worked. Then I tried to run the code on Pascal VOC 2007+2012, with resnet-101, this bug appeared. The environment I used is CUDA 9.0 Python=3.7 PyTorch=0.4.1.post2 torchvision=0.2.1.post2 in colab
I tried several methods, such as
delete -1 in psacal_voc.py
and
delete -1 in imdb.py
and
I changed the TRAIN.RPN_MIN_SIZE = 8 to 0
I've tried all the methods mentioned in https://github.com/jwyang/faster-rcnn.pytorch/issues/111 but it didn't work Could you tell me how to fix the bug?
Called with args: Namespace(TFA=False, batch_size=4, checkepoch=10, checkpoint=21985, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='pascal_voc_0712', disp_interval=100, fix_encoder=False, log_dir='checkpoint', lr=0.001, lr_decay_gamma=0.1, lr_decay_step=4, max_epochs=21, meta_loss=True, meta_train=True, meta_type=1, net='metarcnn', num_workers=1, optimizer='sgd', phase=1, resume=False, save_dir='save_models/VOC_first', session=1, shots=1, start_epoch=1, use_tfboard=True) Loaded dataset
voc_2007_train_first_split
for training Set proposal method: gt Appending horizontally-flipped training examples... wrote gt roidb to /content/drive/MyDrive/few shot/data/cache/voc_2007_train_first_split_gt_roidb.pkl done Preparing training data... done Loaded datasetvoc_2012_train_first_split
for training Set proposal method: gt Appending horizontally-flipped training examples... wrote gt roidb to /content/drive/MyDrive/few shot/data/cache/voc_2012_train_first_split_gt_roidb.pkl done Preparing training data... done before filtering, there are 12330 images... after filtering, there are 12330 images...before class filtering, there are 12330 images... after class filtering, there are 12330 images...
12330 roidb entries Loading pretrained weights from data/resnet101.pth [session 1][epoch 1][iter 0] loss: 17.7643, lr: 1.00e-03 fg/bg=(38/474), time cost: 1.222608 rpn_cls: 0.8498, rpn_box: 0.4404, rcnn_cls: 15.8508, rcnn_box 0.4173, meta_loss 0.2060 Traceback (most recent call last): File "train.py", line 477, in
rois_label, cls_prob, bbox_pred, meta_loss = fasterRCNN(im_data_list, im_info_list, gt_boxes_list, num_boxes_list)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/content/drive/MyDrive/few shot/lib/model/faster_rcnn/faster_rcnn.py", line 84, in forward
roi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, **kwargs)
File "/content/drive/MyDrive/few shot/lib/model/rpn/proposal_target_layer_cascade.py", line 47, in forward
rois_per_image, self._num_classes)
File "/content/drive/MyDrive/few shot/lib/model/rpn/proposal_target_layer_cascade.py", line 202, in _sample_rois_pytorch
raise ValueError("bg_num_rois = 0 and fg_num_rois = 0, this should not happen!")
ValueError: bg_num_rois = 0 and fg_num_rois = 0, this should not happen!