endernewton / tf-faster-rcnn

Tensorflow Faster RCNN for Object Detection
https://arxiv.org/pdf/1702.02138.pdf
MIT License
3.65k stars 1.58k forks source link

Use existing voc data to train on persons and background only #396

Closed Steffgroe closed 5 years ago

Steffgroe commented 5 years ago

Hi, I am playing around for some time with this implementation of faster rcnn. I succeeded to train on the INRIA persons data set myself, however I still can't make this implementation classify persons and background using the pascal voc data. I made the following files for this: When I try to execute the code after adding the data set to the script I get the following error: Preparing training data... Traceback (most recent call last): File "./tools/trainval_net.py", line 105, in imdb, roidb = combined_roidb(args.imdb_name) File "./tools/trainval_net.py", line 76, in combined_roidb roidbs = [get_roidb(s) for s in imdb_names.split('+')] File "./tools/trainval_net.py", line 73, in get_roidb roidb = get_training_roidb(imdb) File tf-faster-rcnn-master/tools/../lib/model/train_val.py", line 332, in get_training_roidb rdl_roidb.prepare_roidb(imdb) File "tf-faster-rcnn-master/tools/../lib/roi_data_layer/roidb.py", line 49, in prepare_roidb assert all(max_classes[nonzero_inds] != 0) AssertionError Command exited with non-zero status 1

How can I change the code to make faster rcnn only detect the person class using the pascal voc data set provided? pascal_voc_person.zip

Steffgroe commented 5 years ago

Fixed by the following code: ` def _load_pascal_annotation(self, index): """ Load image and bounding boxes info from XML file in the PASCAL VOC format. """ filename = os.path.join(self._data_path, 'Annotations', index + '.xml') tree = ET.parse(filename) objs = tree.findall('object') if not self.config['use_diff']:

Exclude the samples labeled as difficult

  non_diff_objs = [
    obj for obj in objs if int(obj.find('difficult').text) == 0]
  # if len(non_diff_objs) != len(objs):
  #     print 'Removed {} difficult objects'.format(
  #         len(objs) - len(non_diff_objs))
  objs = non_diff_objs

cls_objs = [obj for obj in objs if obj.find('name').text in self._classes]
objs = cls_objs
num_objs = len(objs)

boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)

# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
  bbox = obj.find('bndbox')
  # Make pixel indexes 0-based
  x1 = float(bbox.find('xmin').text) - 1
  y1 = float(bbox.find('ymin').text) - 1
  x2 = float(bbox.find('xmax').text) - 1
  y2 = float(bbox.find('ymax').text) - 1
  cls = self._class_to_ind[obj.find('name').text.lower().strip()]
  boxes[ix, :] = [x1, y1, x2, y2]
  gt_classes[ix] = cls
  overlaps[ix, cls] = 1.0
  seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)

overlaps = scipy.sparse.csr_matrix(overlaps)

return {'boxes': boxes,
        'gt_classes': gt_classes,
        'gt_overlaps': overlaps,
        'flipped': False,
        'seg_areas': seg_areas}`
ngthanhvinh commented 5 years ago

Hey @Steffgroe , I want to ask whether you succeeded to detect only 'person' and 'background' based on the existing VOC data. I fixed my code as your code above and finished my training. However, in demo the model detected cats and dogs as 'person' with very high accuracy. Do you know why?

dog_and_cat cat