argusswift / YOLOv4-pytorch

This is a pytorch repository of YOLOv4, attentive YOLOv4 and mobilenet YOLOv4 with PASCAL VOC and COCO
1.68k stars 329 forks source link

使用mobilenetv2-Pytorch训练voc07+12,最终best_mAP仅为0.766,与您0.851结果差距较大。 #92

Open xiaolangwork opened 3 years ago

xiaolangwork commented 3 years ago

硬件使用AMD1400的CPU、GTX1660S显卡,初始化参数设定如下: image 程序并未改动,训练200echo后,结果如下 image loss值已经很难降低了,是不是我在使用上存在问题,希望能够得到作者答复。

nsiakjdw commented 3 years ago

我也是同样的问题,训练过程中最高的map是76.9%,我用eval_voc.py对best单独跑了一遍,得到79%的map。我设置的是260个epoch,batchsize=16,其他参数没有改动

xiaolangwork commented 3 years ago

使用YOLOv4原网络,训练集为VOC07+12,eval最高mAP也仅为0.766,结果最终截图如下: image

zyx1996 commented 3 years ago

请问解决问题了吗,我的结果只有0.711

johnjunjun7 commented 3 years ago

你们都使用了作者提供的预训练权重训练,最后训练只到精度只到0.711吗?

xiaolangwork commented 3 years ago

请问解决问题了吗,我的结果只有0.711

没有再做测试了

你们都使用了作者提供的预训练权重训练,最后训练只到精度只到0.711吗?

使用的预训练权重是作者提供的mobilenetv2权重

johnjunjun7 commented 3 years ago

请问解决问题了吗,我的结果只有0.711

没有再做测试了

你们都使用了作者提供的预训练权重训练,最后训练只到精度只到0.711吗?

使用的预训练权重是作者提供的mobilenetv2权重

发现了一个奇怪的问题: 我使用的训练参数保持和你一样 MODEL_TYPE = { "TYPE": "Mobilenet-YOLOv4" } # YOLO type:YOLOv4, Mobilenet-YOLOv4 or Mobilenetv3-YOLOv4

CONV_TYPE = {"TYPE": "GENERAL"} # conv type:DO_CONV or GENERAL

ATTENTION = {"TYPE": "NONE"} # attention type:SEnet、CBAM or NONE

train

TRAIN = { "DATA_TYPE": "VOC", # DATA_TYPE: VOC ,COCO or Customer "TRAIN_IMG_SIZE": 416, "AUGMENT": True, "BATCH_SIZE": 8, "MULTI_SCALE_TRAIN": False, "IOU_THRESHOLD_LOSS": 0.5, "YOLO_EPOCHS": 200, "Mobilenet_YOLO_EPOCHS": 200, "NUMBER_WORKERS": 4, "MOMENTUM": 0.9, "WEIGHT_DECAY": 0.0005, "LR_INIT": 1e-4, "LR_END": 1e-6, "WARMUP_EPOCHS": 2, # or None 2 }

val

VAL = { "TEST_IMG_SIZE": 416, "BATCH_SIZE": 8, "NUMBER_WORKERS": 4, "CONF_THRESH": 0.005, "NMS_THRESH": 0.45, "MULTI_SCALE_VAL": True, "FLIP_VAL": True, "Visual": True, } 训练精度,最后为 ===== Validate ===== [2020-12-06 08:53:21,792]-[train.py line:355]:val img size is 416 [2020-12-06 09:01:49,587]-[train.py line:361]:aeroplane --> mAP : 0.8490289925189536 [2020-12-06 09:01:49,587]-[train.py line:361]:bicycle --> mAP : 0.8616897356677506 [2020-12-06 09:01:49,587]-[train.py line:361]:bird --> mAP : 0.7630168848608787 [2020-12-06 09:01:49,587]-[train.py line:361]:boat --> mAP : 0.6620164658721044 [2020-12-06 09:01:49,587]-[train.py line:361]:bottle --> mAP : 0.5173023027163647 [2020-12-06 09:01:49,588]-[train.py line:361]:bus --> mAP : 0.8215678524088977 [2020-12-06 09:01:49,588]-[train.py line:361]:car --> mAP : 0.8627548174990283 [2020-12-06 09:01:49,588]-[train.py line:361]:cat --> mAP : 0.8840138936771906 [2020-12-06 09:01:49,588]-[train.py line:361]:chair --> mAP : 0.554274743873003 [2020-12-06 09:01:49,588]-[train.py line:361]:cow --> mAP : 0.8090681201419012 [2020-12-06 09:01:49,588]-[train.py line:361]:diningtable --> mAP : 0.7523790312660907 [2020-12-06 09:01:49,588]-[train.py line:361]:dog --> mAP : 0.8725108153538809 [2020-12-06 09:01:49,588]-[train.py line:361]:horse --> mAP : 0.8785345702312647 [2020-12-06 09:01:49,588]-[train.py line:361]:motorbike --> mAP : 0.8513608916847762 [2020-12-06 09:01:49,588]-[train.py line:361]:person --> mAP : 0.8176911496220403 [2020-12-06 09:01:49,588]-[train.py line:361]:pottedplant --> mAP : 0.5015688965902171 [2020-12-06 09:01:49,588]-[train.py line:361]:sheep --> mAP : 0.8261363186159126 [2020-12-06 09:01:49,588]-[train.py line:361]:sofa --> mAP : 0.7378369430929343 [2020-12-06 09:01:49,588]-[train.py line:361]:train --> mAP : 0.8731558093496673 [2020-12-06 09:01:49,588]-[train.py line:361]:tvmonitor --> mAP : 0.7680994040000935 [2020-12-06 09:01:49,588]-[train.py line:364]:mAP : 0.7732003819521476 [2020-12-06 09:01:49,588]-[train.py line:366]:inference time: 17.60 ms [2020-12-06 09:01:51,161]-[train.py line:370]:save weights done [2020-12-06 09:01:51,162]-[train.py line:371]: ===test mAP:0.773 [2020-12-06 09:01:51,162]-[train.py line:387]: ===cost time:911.7744s 基本上和你保持一致,但是我在训练过程中,把每一次的val验证替换成了,计算val loss,代码如下:

*VAL**

mloss = torch.zeros(4) for i, ( imgs, label_sbbox, label_mbbox, label_lbbox, sbboxes, mbboxes, lbboxes, ) in enumerate(self.test_dataloader): imgs = imgs.to(self.device) label_sbbox = label_sbbox.to(self.device) label_mbbox = label_mbbox.to(self.device) label_lbbox = label_lbbox.to(self.device) sbboxes = sbboxes.to(self.device) mbboxes = mbboxes.to(self.device) lbboxes = lbboxes.to(self.device)

            with torch.no_grad():
                p, p_d = self.yolov4(imgs)

            loss, loss_ciou, loss_conf, loss_cls = self.criterion(
                p,
                p_d,
                label_sbbox,
                label_mbbox,
                label_lbbox,
                sbboxes,
                mbboxes,
                lbboxes,
            )
            loss_items = torch.tensor(
                [loss_ciou, loss_conf, loss_cls, loss]
            )
            mloss = (mloss * i + loss_items) / (i + 1)

通过查看log,发现了在训练最后,train loss已经下降的很低了,但是val loss依旧保持较高的值: 训练集: === Epoch:[200/200],step:[2020/2068],img_size:[416],total_loss:7.5140|loss_ciou:2.3174|loss_conf:2.2943|loss_cls:2.9024|lr:0.000001 === Epoch:[200/200],step:[2030/2068],img_size:[416],total_loss:7.5165|loss_ciou:2.3172|loss_conf:2.2962|loss_cls:2.9031|lr:0.000001 === Epoch:[200/200],step:[2040/2068],img_size:[416],total_loss:7.5094|loss_ciou:2.3161|loss_conf:2.2927|loss_cls:2.9006|lr:0.000001 === Epoch:[200/200],step:[2050/2068],img_size:[416],total_loss:7.5160|loss_ciou:2.3179|loss_conf:2.2947|loss_cls:2.9034|lr:0.000001 === Epoch:[200/200],step:[2060/2068],img_size:[416],total_loss:7.5202|loss_ciou:2.3186|loss_conf:2.2959|loss_cls:2.9056|lr:0.000001 验证集: VAL=== Epoch:[200/200],step:[ 0/618],img_size:[416],total_loss:23.3341|loss_ciou:5.2564|loss_conf:10.6452|loss_cls:7.4325 VAL=== Epoch:[200/200],step:[ 10/618],img_size:[416],total_loss:13.9700|loss_ciou:3.0701|loss_conf:5.9675|loss_cls:4.9324 VAL=== Epoch:[200/200],step:[ 20/618],img_size:[416],total_loss:13.4902|loss_ciou:3.1029|loss_conf:5.7448|loss_cls:4.6425 VAL=== Epoch:[200/200],step:[ 30/618],img_size:[416],total_loss:12.4934|loss_ciou:2.8427|loss_conf:5.2957|loss_cls:4.3550 VAL=== Epoch:[200/200],step:[ 40/618],img_size:[416],total_loss:13.4257|loss_ciou:2.9976|loss_conf:5.7101|loss_cls:4.7180 VAL=== Epoch:[200/200],step:[ 50/618],img_size:[416],total_loss:13.8640|loss_ciou:3.0959|loss_conf:5.8532|loss_cls:4.9149 VAL=== Epoch:[200/200],step:[ 60/618],img_size:[416],total_loss:13.6994|loss_ciou:3.0098|loss_conf:5.8139|loss_cls:4.8757

不知道是否是网络过拟合了

Anleeno-Xu commented 3 years ago

这个0.85到底是mobilenet的还是cspdarknet的