Open DuongVanKhoa0811 opened 2 months ago
Hi Khoa,
Thank you for your interest in our work.
Our implementation is based on Deformable-DETR, which sets the number of classes in COCO to 91 by default. The original COCO paper mentions 91 classes, but the commonly used dataset only includes 80 object detection classes, as explained in this reference.
The extra classes do not affect the final performance since they are neither trained nor used, and the corresponding weights in the linear classifier are essentially ignored. Using classes in the range [1, 80] does not impact performance because the unused classes do not influence the training.
Additionally, the class with ID 0 is reserved for foreground classification, where all objects are classified to class ID 0 in the encoder head, which is used in two-stage training.
I hope this addresses your concerns. Please feel free to reach out if you have further questions.
Really appreciate for your fast response.
In addition, I just changed the num_classes to 21 and trained the MS-DETR on VOC0712 (16551 samples). Unfortunately, the accuracy is too low, could you give me some guidance on this?
Below is the log after the first epoch and 34th epochs:
After the first epochs: {"train_lr": 0.00019999999999998193, "train_class_error": 71.80660301738635, "train_grad_norm": 219.8104621555294, "train_loss": 85.37451796701667, "train_loss_ce": 0.8044946488008758, "train_loss_bbox": 1.0721628802924719, "train_loss_giou": 0.8975641289173296, "train_loss_ce_0": 0.800403747000363, "train_loss_bbox_0": 1.1586866971931429, "train_loss_giou_0": 0.9399518402699617, "train_loss_ce_1": 0.7995713390324412, "train_loss_bbox_1": 1.108939633432472, "train_loss_giou_1": 0.918134202881522, "train_loss_ce_2": 0.8033310748695247, "train_loss_bbox_2": 1.084095978409142, "train_loss_giou_2": 0.9034838193817081, "train_loss_ce_3": 0.8039973848932099, "train_loss_bbox_3": 1.0761547603599975, "train_loss_giou_3": 0.8998438039313991, "train_loss_ce_4": 0.8051173802192838, "train_loss_bbox_4": 1.0749586773116784, "train_loss_giou_4": 0.898583639301022, "train_loss_ce_o2m": 4.249423578981187, "train_loss_bbox_o2m": 4.182974482029226, "train_loss_giou_o2m": 2.202238660737467, "train_loss_ce_0_o2m": 4.220417819664197, "train_loss_bbox_0_o2m": 4.371570605464572, "train_loss_giou_0_o2m": 2.2772238857220306, "train_loss_ce_1_o2m": 4.24689982175827, "train_loss_bbox_1_o2m": 4.332502252947528, "train_loss_giou_1_o2m": 2.2689856534926194, "train_loss_ce_2_o2m": 4.2797206614240775, "train_loss_bbox_2_o2m": 4.232159276113049, "train_loss_giou_2_o2m": 2.234157325036576, "train_loss_ce_3_o2m": 4.257017854563779, "train_loss_bbox_3_o2m": 4.220445931501619, "train_loss_giou_3_o2m": 2.2258202193871006, "train_loss_ce_4_o2m": 4.2416134119898175, "train_loss_bbox_4_o2m": 4.226132402157135, "train_loss_giou_4_o2m": 2.2206091794434633, "train_loss_ce_enc": 1.3789663734032669, "train_loss_bbox_enc": 1.6490743674846575, "train_loss_giou_enc": 1.0070887521602596, "train_loss_ce_unscaled": 0.8044946488008758, "train_class_error_unscaled": 71.80660301738635, "train_loss_bbox_unscaled": 0.21443257589354617, "train_loss_giou_unscaled": 0.4487820644586648, "train_cardinality_error_unscaled": 295.1966163141994, "train_loss_ce_0_unscaled": 0.800403747000363, "train_loss_bbox_0_unscaled": 0.23173733942188166, "train_loss_giou_0_unscaled": 0.46997592013498085, "train_cardinality_error_0_unscaled": 294.42712990936553, "train_loss_ce_1_unscaled": 0.7995713390324412, "train_loss_bbox_1_unscaled": 0.22178792655783117, "train_loss_giou_1_unscaled": 0.459067101440761, "train_cardinality_error_1_unscaled": 294.5300906344411, "train_loss_ce_2_unscaled": 0.8033310748695247, "train_loss_bbox_2_unscaled": 0.2168191956717442, "train_loss_giou_2_unscaled": 0.45174190969085404, "train_cardinality_error_2_unscaled": 294.8544410876133, "train_loss_ce_3_unscaled": 0.8039973848932099, "train_loss_bbox_3_unscaled": 0.21523095190119887, "train_loss_giou_3_unscaled": 0.44992190196569953, "train_cardinality_error_3_unscaled": 295.16036253776434, "train_loss_ce_4_unscaled": 0.8051173802192838, "train_loss_bbox_4_unscaled": 0.21499173537526967, "train_loss_giou_4_unscaled": 0.449291819650511, "train_cardinality_error_4_unscaled": 295.0172205438067, "train_loss_ce_o2m_unscaled": 2.1247117894905934, "train_class_error_o2m_unscaled": 72.04633336830716, "train_loss_bbox_o2m_unscaled": 0.8365948952036681, "train_loss_giou_o2m_unscaled": 1.1011193303687334, "train_cardinality_error_o2m_unscaled": 295.67510574018127, "train_loss_ce_0_o2m_unscaled": 2.1102089098320986, "train_loss_bbox_0_o2m_unscaled": 0.8743141208179408, "train_loss_giou_0_o2m_unscaled": 1.1386119428610153, "train_cardinality_error_0_o2m_unscaled": 294.9025981873112, "train_loss_ce_1_o2m_unscaled": 2.123449910879135, "train_loss_bbox_1_o2m_unscaled": 0.8665004491220788, "train_loss_giou_1_o2m_unscaled": 1.1344928267463097, "train_cardinality_error_1_o2m_unscaled": 295.22060422960726, "train_loss_ce_2_o2m_unscaled": 2.1398603307120387, "train_loss_bbox_2_o2m_unscaled": 0.8464318547202021, "train_loss_giou_2_o2m_unscaled": 1.117078662518288, "train_cardinality_error_2_o2m_unscaled": 295.45728096676737, "train_loss_ce_3_o2m_unscaled": 2.1285089272818896, "train_loss_bbox_3_o2m_unscaled": 0.8440891853028915, "train_loss_giou_3_o2m_unscaled": 1.1129101096935503, "train_cardinality_error_3_o2m_unscaled": 295.09383685800606, "train_loss_ce_4_o2m_unscaled": 2.1208067059949087, "train_loss_bbox_4_o2m_unscaled": 0.8452264796895202, "train_loss_giou_4_o2m_unscaled": 1.1103045897217316, "train_cardinality_error_4_o2m_unscaled": 295.38205438066467, "train_loss_ce_enc_unscaled": 0.6894831867016334, "train_loss_bbox_enc_unscaled": 0.3298148731320106, "train_loss_giou_enc_unscaled": 0.5035443760801298, "train_cardinality_error_enc_unscaled": 13941.062658610272, "test_class_error": 69.26391205918615, "test_loss": 80.38840903835266, "test_loss_ce": 0.8019415270868141, "test_loss_bbox": 1.0441190655040817, "test_loss_giou": 0.9651208738518069, "test_loss_ce_0": 0.797229378332229, "test_loss_bbox_0": 1.1303567040188054, "test_loss_giou_0": 0.984277335063273, "test_loss_ce_1": 0.7925811146110633, "test_loss_bbox_1": 1.1044362741906153, "test_loss_giou_1": 0.9798792177429106, "test_loss_ce_2": 0.8055892890508802, "test_loss_bbox_2": 1.080018329805723, "test_loss_giou_2": 0.9593973084742592, "test_loss_ce_3": 0.8018067954574533, "test_loss_bbox_3": 1.0493565998995535, "test_loss_giou_3": 0.9639106923849691, "test_loss_ce_4": 0.7999479768975293, "test_loss_bbox_4": 1.0518214491486357, "test_loss_giou_4": 0.9687899208246987, "test_loss_ce_o2m": 3.8949552460517944, "test_loss_bbox_o2m": 3.8046848026250215, "test_loss_giou_o2m": 2.1255790529853886, "test_loss_ce_0_o2m": 3.9088794979265318, "test_loss_bbox_0_o2m": 3.9431229767439246, "test_loss_giou_0_o2m": 2.1382573213013, "test_loss_ce_1_o2m": 3.986596904466341, "test_loss_bbox_1_o2m": 3.9779614805693773, "test_loss_giou_1_o2m": 2.1537481622713255, "test_loss_ce_2_o2m": 3.693862157270481, "test_loss_bbox_2_o2m": 3.8494812220407226, "test_loss_giou_2_o2m": 2.1351816273371123, "test_loss_ce_3_o2m": 3.8790926972527497, "test_loss_bbox_3_o2m": 3.878890388363879, "test_loss_giou_3_o2m": 2.1402453360823324, "test_loss_ce_4_o2m": 3.875369939429502, "test_loss_bbox_4_o2m": 3.8640578079108083, "test_loss_giou_4_o2m": 2.144540162122115, "test_loss_ce_enc": 1.4438144798858485, "test_loss_bbox_enc": 1.4813275530163998, "test_loss_giou_enc": 0.9881803867841961, "test_loss_ce_unscaled": 0.8019415270868141, "test_class_error_unscaled": 69.26391205918615, "test_loss_bbox_unscaled": 0.20882381285767948, "test_loss_giou_unscaled": 0.48256043692590345, "test_cardinality_error_unscaled": 296.9757673667205, "test_loss_ce_0_unscaled": 0.797229378332229, "test_loss_bbox_0_unscaled": 0.22607134058108622, "test_loss_giou_0_unscaled": 0.4921386675316365, "test_cardinality_error_0_unscaled": 296.9757673667205, "test_loss_ce_1_unscaled": 0.7925811146110633, "test_loss_bbox_1_unscaled": 0.22088725464975217, "test_loss_giou_1_unscaled": 0.4899396088714553, "test_cardinality_error_1_unscaled": 296.9757673667205, "test_loss_ce_2_unscaled": 0.8055892890508802, "test_loss_bbox_2_unscaled": 0.21600366592924794, "test_loss_giou_2_unscaled": 0.4796986542371296, "test_cardinality_error_2_unscaled": 296.9757673667205, "test_loss_ce_3_unscaled": 0.8018067954574533, "test_loss_bbox_3_unscaled": 0.20987131979515075, "test_loss_giou_3_unscaled": 0.48195534619248454, "test_cardinality_error_3_unscaled": 296.9757673667205, "test_loss_ce_4_unscaled": 0.7999479768975293, "test_loss_bbox_4_unscaled": 0.21036428991277883, "test_loss_giou_4_unscaled": 0.48439496041234936, "test_cardinality_error_4_unscaled": 296.9757673667205, "test_loss_ce_o2m_unscaled": 1.9474776230258972, "test_class_error_o2m_unscaled": 69.29007412351199, "test_loss_bbox_o2m_unscaled": 0.7609369600634053, "test_loss_giou_o2m_unscaled": 1.0627895264926943, "test_cardinality_error_o2m_unscaled": 296.9757673667205, "test_loss_ce_0_o2m_unscaled": 1.9544397489632659, "test_loss_bbox_0_o2m_unscaled": 0.7886245945658119, "test_loss_giou_0_o2m_unscaled": 1.06912866065065, "test_cardinality_error_0_o2m_unscaled": 296.9757673667205, "test_loss_ce_1_o2m_unscaled": 1.9932984522331705, "test_loss_bbox_1_o2m_unscaled": 0.7955922954602946, "test_loss_giou_1_o2m_unscaled": 1.0768740811356627, "test_cardinality_error_1_o2m_unscaled": 296.97495961227787, "test_loss_ce_2_o2m_unscaled": 1.8469310786352404, "test_loss_bbox_2_o2m_unscaled": 0.7698962439020106, "test_loss_giou_2_o2m_unscaled": 1.0675908136685561, "test_cardinality_error_2_o2m_unscaled": 296.9757673667205, "test_loss_ce_3_o2m_unscaled": 1.9395463486263749, "test_loss_bbox_3_o2m_unscaled": 0.7757780767212922, "test_loss_giou_3_o2m_unscaled": 1.0701226680411662, "test_cardinality_error_3_o2m_unscaled": 295.0654281098546, "test_loss_ce_4_o2m_unscaled": 1.937684969714751, "test_loss_bbox_4_o2m_unscaled": 0.7728115598188174, "test_loss_giou_4_o2m_unscaled": 1.0722700810610575, "test_cardinality_error_4_o2m_unscaled": 296.9757673667205, "test_loss_ce_enc_unscaled": 0.7219072399429243, "test_loss_bbox_enc_unscaled": 0.2962655103456993, "test_loss_giou_enc_unscaled": 0.49409019339209803, "test_cardinality_error_enc_unscaled": 21292.355411954766, "test_coco_eval_bbox": [0.0003855556643685181, 0.0014334943292030255, 0.00012571609138676404, 0.0, 1.2126271439774832e-06, 0.0006577175270891887, 0.012689536345213661, 0.024618102695313632, 0.031324099012440834, 0.0, 0.0007122497740957292, 0.05396739417499884], "epoch": 0, "n_parameters": 53445295}
After 34th epochs: {"train_lr": 1.999999999999602e-07, "train_class_error": 67.8085808187525, "train_grad_norm": 105.36065779994262, "train_loss": 81.56075509615896, "train_loss_ce": 0.7913178410198753, "train_loss_bbox": 0.8093621504576904, "train_loss_giou": 0.7775356226745329, "train_loss_ce_0": 0.7941225083500957, "train_loss_bbox_0": 0.8357652990173358, "train_loss_giou_0": 0.7933775600340791, "train_loss_ce_1": 0.7928290369863596, "train_loss_bbox_1": 0.8165828713081394, "train_loss_giou_1": 0.7829681597924665, "train_loss_ce_2": 0.7930453124003107, "train_loss_bbox_2": 0.8120622694294258, "train_loss_giou_2": 0.7788048408002651, "train_loss_ce_3": 0.793075495544157, "train_loss_bbox_3": 0.810473475981605, "train_loss_giou_3": 0.7776253341079838, "train_loss_ce_4": 0.7929168640954977, "train_loss_bbox_4": 0.8102477846909145, "train_loss_giou_4": 0.7778304856592435, "train_loss_ce_o2m": 4.550314342493017, "train_loss_bbox_o2m": 3.9090440303357346, "train_loss_giou_o2m": 2.110583157697833, "train_loss_ce_0_o2m": 4.560289402735558, "train_loss_bbox_0_o2m": 3.903098637870431, "train_loss_giou_0_o2m": 2.1083935507870875, "train_loss_ce_1_o2m": 4.550175091838549, "train_loss_bbox_1_o2m": 3.913905988078103, "train_loss_giou_1_o2m": 2.1159887936864377, "train_loss_ce_2_o2m": 4.528434634194273, "train_loss_bbox_2_o2m": 3.9097024515316203, "train_loss_giou_2_o2m": 2.1079230966380718, "train_loss_ce_3_o2m": 4.557783322002953, "train_loss_bbox_3_o2m": 3.935633503421916, "train_loss_giou_3_o2m": 2.11874213136575, "train_loss_ce_4_o2m": 4.54293273862395, "train_loss_bbox_4_o2m": 3.9340230178760978, "train_loss_giou_4_o2m": 2.1190018866933724, "train_loss_ce_enc": 1.3251384898975176, "train_loss_bbox_enc": 1.4947816726232224, "train_loss_giou_enc": 0.9249224884596476, "train_loss_ce_unscaled": 0.7913178410198753, "train_class_error_unscaled": 67.8085808187525, "train_loss_bbox_unscaled": 0.16187243002193932, "train_loss_giou_unscaled": 0.38876781133726646, "train_cardinality_error_unscaled": 296.8139577039275, "train_loss_ce_0_unscaled": 0.7941225083500957, "train_loss_bbox_0_unscaled": 0.16715305968353755, "train_loss_giou_0_unscaled": 0.39668878001703956, "train_cardinality_error_0_unscaled": 297.16561933534746, "train_loss_ce_1_unscaled": 0.7928290369863596, "train_loss_bbox_1_unscaled": 0.1633165741448496, "train_loss_giou_1_unscaled": 0.39148407989623324, "train_cardinality_error_1_unscaled": 296.6843504531722, "train_loss_ce_2_unscaled": 0.7930453124003107, "train_loss_bbox_2_unscaled": 0.16241245373543295, "train_loss_giou_2_unscaled": 0.38940242040013257, "train_cardinality_error_2_unscaled": 296.47389728096675, "train_loss_ce_3_unscaled": 0.793075495544157, "train_loss_bbox_3_unscaled": 0.162094695048795, "train_loss_giou_3_unscaled": 0.3888126670539919, "train_cardinality_error_3_unscaled": 296.5332930513595, "train_loss_ce_4_unscaled": 0.7929168640954977, "train_loss_bbox_4_unscaled": 0.16204955680132632, "train_loss_giou_4_unscaled": 0.38891524282962175, "train_cardinality_error_4_unscaled": 296.3341993957704, "train_loss_ce_o2m_unscaled": 2.2751571712465086, "train_class_error_o2m_unscaled": 69.79603332058541, "train_loss_bbox_o2m_unscaled": 0.7818088051118519, "train_loss_giou_o2m_unscaled": 1.0552915788489166, "train_cardinality_error_o2m_unscaled": 297.28574018126886, "train_loss_ce_0_o2m_unscaled": 2.280144701367779, "train_loss_bbox_0_o2m_unscaled": 0.7806197266366186, "train_loss_giou_0_o2m_unscaled": 1.0541967753935437, "train_cardinality_error_0_o2m_unscaled": 297.29141993957705, "train_loss_ce_1_o2m_unscaled": 2.2750875459192743, "train_loss_bbox_1_o2m_unscaled": 0.7827811969154911, "train_loss_giou_1_o2m_unscaled": 1.0579943968432188, "train_cardinality_error_1_o2m_unscaled": 297.28845921450153, "train_loss_ce_2_o2m_unscaled": 2.2642173170971365, "train_loss_bbox_2_o2m_unscaled": 0.7819404890367992, "train_loss_giou_2_o2m_unscaled": 1.0539615483190359, "train_cardinality_error_2_o2m_unscaled": 297.2718429003021, "train_loss_ce_3_o2m_unscaled": 2.2788916610014764, "train_loss_bbox_3_o2m_unscaled": 0.7871266996275262, "train_loss_giou_3_o2m_unscaled": 1.059371065682875, "train_cardinality_error_3_o2m_unscaled": 297.2734743202417, "train_loss_ce_4_o2m_unscaled": 2.271466369311975, "train_loss_bbox_4_o2m_unscaled": 0.7868046024297659, "train_loss_giou_4_o2m_unscaled": 1.0595009433466862, "train_cardinality_error_4_o2m_unscaled": 297.2690634441088, "train_loss_ce_enc_unscaled": 0.6625692449487588, "train_loss_bbox_enc_unscaled": 0.2989563342361652, "train_loss_giou_enc_unscaled": 0.4624612442298238, "train_cardinality_error_enc_unscaled": 13923.343021148035, "test_class_error": 64.13844362483849, "test_loss": 78.79375580555018, "test_loss_ce": 0.7910687770808841, "test_loss_bbox": 0.8477691111300797, "test_loss_giou": 0.8638646393797702, "test_loss_ce_0": 0.7972862343747704, "test_loss_bbox_0": 0.8601595111829207, "test_loss_giou_0": 0.8724178837278259, "test_loss_ce_1": 0.792520616065127, "test_loss_bbox_1": 0.8504500792622085, "test_loss_giou_1": 0.8670524144199149, "test_loss_ce_2": 0.7915313722917837, "test_loss_bbox_2": 0.8500514249761193, "test_loss_giou_2": 0.8632380155325898, "test_loss_ce_3": 0.791762308754713, "test_loss_bbox_3": 0.8497003961192001, "test_loss_giou_3": 0.8640827322694712, "test_loss_ce_4": 0.7904652780063318, "test_loss_bbox_4": 0.8502327690866922, "test_loss_giou_4": 0.8640380405575663, "test_loss_ce_o2m": 4.221672767583696, "test_loss_bbox_o2m": 3.6643626122029804, "test_loss_giou_o2m": 2.116294725068359, "test_loss_ce_0_o2m": 4.243487062958793, "test_loss_bbox_0_o2m": 3.7035500733888975, "test_loss_giou_0_o2m": 2.122325596304818, "test_loss_ce_1_o2m": 4.200419696496261, "test_loss_bbox_1_o2m": 3.6978710593478166, "test_loss_giou_1_o2m": 2.13383673360159, "test_loss_ce_2_o2m": 4.159585401776919, "test_loss_bbox_2_o2m": 3.659527484201172, "test_loss_giou_2_o2m": 2.1043465435986373, "test_loss_ce_3_o2m": 4.227324989240659, "test_loss_bbox_3_o2m": 3.7232838946850504, "test_loss_giou_3_o2m": 2.1325583891500757, "test_loss_ce_4_o2m": 4.194914827005928, "test_loss_bbox_4_o2m": 3.6846929462160163, "test_loss_giou_4_o2m": 2.121341207926801, "test_loss_ce_enc": 1.4042750294643767, "test_loss_bbox_enc": 1.3278895811943612, "test_loss_giou_enc": 0.8925036950599589, "test_loss_ce_unscaled": 0.7910687770808841, "test_class_error_unscaled": 64.13844362483849, "test_loss_bbox_unscaled": 0.16955382225249618, "test_loss_giou_unscaled": 0.4319323196898851, "test_cardinality_error_unscaled": 296.9266962843296, "test_loss_ce_0_unscaled": 0.7972862343747704, "test_loss_bbox_0_unscaled": 0.1720319020325658, "test_loss_giou_0_unscaled": 0.43620894186391296, "test_cardinality_error_0_unscaled": 296.9757673667205, "test_loss_ce_1_unscaled": 0.792520616065127, "test_loss_bbox_1_unscaled": 0.1700900156240495, "test_loss_giou_1_unscaled": 0.43352620720995744, "test_cardinality_error_1_unscaled": 296.0860258481422, "test_loss_ce_2_unscaled": 0.7915313722917837, "test_loss_bbox_2_unscaled": 0.1700102848757618, "test_loss_giou_2_unscaled": 0.4316190077662949, "test_cardinality_error_2_unscaled": 296.0680533117932, "test_loss_ce_3_unscaled": 0.791762308754713, "test_loss_bbox_3_unscaled": 0.16994007912905276, "test_loss_giou_3_unscaled": 0.4320413661347356, "test_cardinality_error_3_unscaled": 296.93780290791597, "test_loss_ce_4_unscaled": 0.7904652780063318, "test_loss_bbox_4_unscaled": 0.17004655363408305, "test_loss_giou_4_unscaled": 0.43201902027878314, "test_cardinality_error_4_unscaled": 296.13327948303714, "test_loss_ce_o2m_unscaled": 2.110836383791848, "test_class_error_o2m_unscaled": 66.95682819594474, "test_loss_bbox_o2m_unscaled": 0.7328725220644561, "test_loss_giou_o2m_unscaled": 1.0581473625341795, "test_cardinality_error_o2m_unscaled": 296.9757673667205, "test_loss_ce_0_o2m_unscaled": 2.1217435314793964, "test_loss_bbox_0_o2m_unscaled": 0.7407100136197731, "test_loss_giou_0_o2m_unscaled": 1.061162798152409, "test_cardinality_error_0_o2m_unscaled": 296.9757673667205, "test_loss_ce_1_o2m_unscaled": 2.1002098482481304, "test_loss_bbox_1_o2m_unscaled": 0.7395742110793685, "test_loss_giou_1_o2m_unscaled": 1.066918366800795, "test_cardinality_error_1_o2m_unscaled": 296.9757673667205, "test_loss_ce_2_o2m_unscaled": 2.0797927008884596, "test_loss_bbox_2_o2m_unscaled": 0.7319054962203556, "test_loss_giou_2_o2m_unscaled": 1.0521732717993186, "test_cardinality_error_2_o2m_unscaled": 296.9757673667205, "test_loss_ce_3_o2m_unscaled": 2.1136624946203293, "test_loss_bbox_3_o2m_unscaled": 0.7446567786011923, "test_loss_giou_3_o2m_unscaled": 1.0662791945750378, "test_cardinality_error_3_o2m_unscaled": 296.9757673667205, "test_loss_ce_4_o2m_unscaled": 2.097457413502964, "test_loss_bbox_4_o2m_unscaled": 0.7369385887274401, "test_loss_giou_4_o2m_unscaled": 1.0606706039634004, "test_cardinality_error_4_o2m_unscaled": 296.9757673667205, "test_loss_ce_enc_unscaled": 0.7021375147321883, "test_loss_bbox_enc_unscaled": 0.26557791618982357, "test_loss_giou_enc_unscaled": 0.44625184752997943, "test_cardinality_error_enc_unscaled": 21292.355411954766, "test_coco_eval_bbox": [0.001467092259092334, 0.00401862978679696, 0.0008434506276570325, 5.1407009859864495e-08, 1.8306128532202538e-05, 0.002004709937837929, 0.03928643692174168, 0.05601486538134469, 0.06196569759867778, 8.392077878482712e-06, 0.0013475545681088058, 0.09068425267611004], "epoch": 33, "n_parameters": 53445295}
The accuracy on evaluation datasets after 34th epochs training: Epochs: 34 IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.004 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.001 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.039 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.056 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.062 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.091
Dear Zhao Chuyang,
I made some mistakes when trying to add some code for debugging. I can train on the VOC0712 right now, sorry for bothering you.
Dear Authors,
I am currently configuring your model on the VOC0712 dataset (a combination of VOC 2007 and VOC 2012), which contains 20 object detection classes. I have converted the VOC annotations to the COCO annotation format using this link.
In the build(args) function (located in models/deformable_detr.py), I set the num_classes variable to 21. Although VOC0712 only has 20 classes with IDs ranging from 1 to 20, setting the variable to 20 causes a 'cuda/IndexKernel.cu:92 error.' As a workaround, I followed the suggestion in this link and increased the number of classes by 1 to ensure that the IDs remain within the range [0: num_classes-1]. And it frees the error.
After reviewing the annotations, I noticed that the category IDs for COCO 2017 are as follows: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90], which corresponds to 80 classes.
Could you please clarify why you set num_classes to 91, despite there being only 80 categories? Additionally, could you explain why the category IDs are not converted to a range from 0 to 79 instead?
Thank you for your time and assistance.
Kind regards, Khoa