NVlabs / wetectron

Weakly-supervised object detection.
Other
362 stars 45 forks source link

Question about training with own data #81

Open ghZHM opened 2 years ago

ghZHM commented 2 years ago

I follow the instructions to install the model. And I train the model in voc2007 for 30,000 iteration with ' IMS_PER_BATCH: 2', the result seems normal. mAP: 0.3279 aeroplane : 0.5760 bicycle : 0.5919 bird : 0.2792 boat : 0.1606 bottle : 0.2195 bus : 0.4927 car : 0.7099 cat : 0.0822 chair : 0.1200 cow : 0.3114 diningtable : 0.0533 dog : 0.1096 horse : 0.0607 motorbike : 0.6474 person : 0.3175 pottedplant : 0.2002 sheep : 0.3854 sofa : 0.2155 train : 0.4436 tvmonitor : 0.5824 It did not reach the best point, but at least it proves it can works properly. But when I use my own data to train the network, the map is 0.0005%. My dataset has only 2 classes, and about 12000 images in total. One class exist in every image, and this class can not be recognized. The map of this class is 0. I modify some of the training settings in config file. ROI_BOX_HEAD: NUM_CLASSES: 3 Since I have only one GPU, I met the OOM problem, and I follow the comment, and change the settings to SOLVER: IMS_PER_BATCH: 1 BASE_LR: 0.0025 WEIGHT_DECAY: 0.0001 WARMUP_ITERS: 200 STEPS: (0, 30000, 40000) MAX_ITER: 60000 CHECKPOINT_PERIOD: 1000 I change the "CLASSES" in datasets/voc.py to the classes of my dataset as following CLASSES = ( "__background__ ", "trafficePolice", "pedestrian", ) The spelling is copied from annotation file. And I remove the lower() in name = obj.find("name").text.lower().strip() in lines136 of voc.py because the unrecognized class has a upper class letter. Except I point out the location of regions proposal and dataset, I do not make any other changes on the code. This overall training process seems to be normal. The data was feed into model, and the trend of loss keeps decreasing and every element of loss has value. Could you give me some advice?

ghZHM commented 2 years ago

I made some visualisation, and I found that almost all the bounding boxes are predicted in the background and concentrated on the left side of the image. It seems that in my dataset, the loss cannot supervise the model to learn. By the way, could you please tell me why the weakly supervised-based object detection methods perform badly in class "person"

jason718 commented 2 years ago

several things I noticed here:

  1. One class exist in every image. I assume this is person. WSOD won't work for this class since there is no negative samples in the entire set. The network can easily cheat.
  2. You training setting is very different to what recommended. This code is not optimized for that setting.
ghZHM commented 2 years ago

Thank you for your reply. My hardware could not work in the recommended settings since I only have 1 GPU, maybe I should use another base network