Training datasets with 2 classes (with style of pascal voc) but got an AP=0

BestSongEver commented 6 years ago

Thanx for ur excellent work! @ruinmessi I need a little help please. @ruinmessi @yuyijie1995 I am trying to train RFBNet on a dataset with only 2 classes, with style of pascal voc. I changed ~/data/voc0712.py like this:

26 VOC_CLASSES = ( 'background', # always index 0 27 'person') 28 ### 'eroplane', 'bicycle', 'bird', 'boat', 29 ### 'bottle', 'bus', 'car', 'cat', 'chair', 30 ### 'cow', 'diningtable', 'dog', 'horse', 31 ### 'motorbike', 'person', 'pottedplant', 32 ### 'sheep', 'sofa', 'train', 'tvmonitor')

and changed train_RFB.py like this

82 num_classes = (2, 2)[args.dataset == 'COCO'] 83 ###num_classes = (21, 81)[args.dataset == 'COCO']

I run train_RFB.py, but got AP=0.0000 after 120 epoth.

I looked into the loss while training, it did not go well. screenshot I trained my dataset on faster r-cnn once, and got a normal AP of 49%. And this time i set the right path of my data.(Last time i set the wrong path, so i cannot run the train_RFB.py successfully, like @Excalibur0214 said. ) So, mybe there is nothing wrong with my dataset. Should i change more scripts? Could u please tell me how can i fix it ? Thanx again!

left4back commented 6 years ago

You sure you get the dataset right? The error message told you that your dataset annotations may have class 'sofa'.

BestSongEver commented 6 years ago

@Excalibur0214 Could u please help me with my issue? I am new with this and i wanna train the program on my dataset. Thanx!

left4back commented 6 years ago

I've met your former question before, but I never met your present problem. Maybe you should carefully examine your output boxes and draw them on your test pic. This idea is worth to try.

GOATmessi8 commented 6 years ago

@BestSongEver Your location loss is abnormal so I just you to check the annotation of your own data. For training, the box format is [xmin, ymin, xmax, ymax]

BestSongEver commented 6 years ago

@ruinmessi Thanx for ur reply. I made my dataset strictly follow Pascal VOC2007, and it can be trained successfully by other program. Still and all i tried another Pascal VOC-like dataset with only one class ( bird ). But the code still didn't work .

I finetuned NUM_CLASSES and num_classes in data/voc0712.py; models/*.py; train_RFB.py and test_RFB.py.

I just found that Location loss and Class loss went well at the begin. But after a few of Eposh(maybe 4 or 7), Class loss suddently dropped from 2. to 0.0 abnormally. Then Location loss would maintain higher then 1.0000 and never drop again. While Class loss would maintain lower then 0.1

For my person dataset:

Epoch:6 || epochiter: 735/769|| Totel iter 4580 || L: 1.9189 C: 2.5317||Batch time: 1.3623 sec. ||LR: 0.00400000 Epoch:6 || epochiter: 745/769|| Totel iter 4590 || L: 1.7877 C: 2.2604||Batch time: 1.3614 sec. ||LR: 0.00400000 Epoch:6 || epochiter: 755/769|| Totel iter 4600 || L: 1.5482 C: 1.7651||Batch time: 1.3544 sec. ||LR: 0.00400000 Epoch:6 || epochiter: 765/769|| Totel iter 4610 || L: 1.3797 C: 2.2506||Batch time: 1.3631 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 6/769|| Totel iter 4620 || L: 1.0155 C: 0.9434||Batch time: 1.3564 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 16/769|| Totel iter 4630 || L: 1.5007 C: 0.2787||Batch time: 1.3590 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 26/769|| Totel iter 4640 || L: 1.5300 C: 0.2775||Batch time: 1.3682 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 36/769|| Totel iter 4650 || L: 1.8310 C: 0.5066||Batch time: 1.3597 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 46/769|| Totel iter 4660 || L: 1.5827 C: 0.1721||Batch time: 1.3624 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 56/769|| Totel iter 4670 || L: 2.2631 C: 0.0913||Batch time: 1.3968 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 66/769|| Totel iter 4680 || L: 1.2228 C: 0.0428||Batch time: 1.3537 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 76/769|| Totel iter 4690 || L: 1.5303 C: 0.0263||Batch time: 1.3642 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 86/769|| Totel iter 4700 || L: 2.2687 C: 0.0883||Batch time: 1.3423 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 96/769|| Totel iter 4710 || L: 1.1470 C: 0.0194||Batch time: 1.3320 sec. ||LR: 0.00400000 Epoch:7 || epochiter: 106/769|| Totel iter 4720 || L: 1.8459 C: 0.0782||Batch time: 1.3386 sec. ||LR: 0.00400000

For a VOC-like bird dataset:

Epoch:3 || epochiter: 12/44|| Totel iter 100 || L: 3.3108 C: 3.3798||Batch time: 1.2181 sec. ||LR: 0.00181873 Epoch:3 || epochiter: 22/44|| Totel iter 110 || L: 3.2515 C: 3.1052||Batch time: 1.2168 sec. ||LR: 0.00200050 Epoch:3 || epochiter: 32/44|| Totel iter 120 || L: 2.8687 C: 3.1402||Batch time: 3.3242 sec. ||LR: 0.00218227 Epoch:3 || epochiter: 42/44|| Totel iter 130 || L: 2.6957 C: 3.2901||Batch time: 1.2092 sec. ||LR: 0.00236405 Epoch:4 || epochiter: 8/44|| Totel iter 140 || L: 2.9657 C: 2.9876||Batch time: 30.9130 sec. ||LR: 0.00254582 Epoch:4 || epochiter: 18/44|| Totel iter 150 || L: 3.1143 C: 0.8904||Batch time: 1.2268 sec. ||LR: 0.00272759 Epoch:4 || epochiter: 28/44|| Totel iter 160 || L: 2.9572 C: 0.1551||Batch time: 2.3035 sec. ||LR: 0.00290936 Epoch:4 || epochiter: 38/44|| Totel iter 170 || L: 3.0392 C: 0.1138||Batch time: 1.2104 sec. ||LR: 0.00309114 Epoch:5 || epochiter: 4/44|| Totel iter 180 || L: 2.9200 C: 45.1219||Batch time: 1.2046 sec. ||LR: 0.00327291 Epoch:5 || epochiter: 14/44|| Totel iter 190 || L: 5.1481 C: 2.3512||Batch time: 1.2625 sec. ||LR: 0.00345468 Epoch:5 || epochiter: 24/44|| Totel iter 200 || L: 3.3059 C: 0.1716||Batch time: 27.4927 sec. ||LR: 0.00363645 Epoch:5 || epochiter: 34/44|| Totel iter 210 || L: 3.5403 C: 0.0787||Batch time: 27.3811 sec. ||LR: 0.00381823 Epoch:6 || epochiter: 0/44|| Totel iter 220 || L: 3.3519 C: 0.0618||Batch time: 58.6951 sec. ||LR: 0.00400000 Epoch:6 || epochiter: 10/44|| Totel iter 230 || L: 3.8552 C: 0.0889||Batch time: 1.2170 sec. ||LR: 0.00400000

I am still confused. Please help me if u can. Thanx again.

GOATmessi8 commented 6 years ago

@BestSongEver you may try a small lr like 0.001 or 0.0005 in your case.