Open xiaolangwork opened 3 years ago
我也是同样的问题,训练过程中最高的map是76.9%,我用eval_voc.py对best单独跑了一遍,得到79%的map。我设置的是260个epoch,batchsize=16,其他参数没有改动
使用YOLOv4原网络,训练集为VOC07+12,eval最高mAP也仅为0.766,结果最终截图如下:
请问解决问题了吗,我的结果只有0.711
你们都使用了作者提供的预训练权重训练,最后训练只到精度只到0.711吗?
请问解决问题了吗,我的结果只有0.711
没有再做测试了
你们都使用了作者提供的预训练权重训练,最后训练只到精度只到0.711吗?
使用的预训练权重是作者提供的mobilenetv2权重
请问解决问题了吗,我的结果只有0.711
没有再做测试了
你们都使用了作者提供的预训练权重训练,最后训练只到精度只到0.711吗?
使用的预训练权重是作者提供的mobilenetv2权重
发现了一个奇怪的问题: 我使用的训练参数保持和你一样 MODEL_TYPE = { "TYPE": "Mobilenet-YOLOv4" } # YOLO type:YOLOv4, Mobilenet-YOLOv4 or Mobilenetv3-YOLOv4
CONV_TYPE = {"TYPE": "GENERAL"} # conv type:DO_CONV or GENERAL
ATTENTION = {"TYPE": "NONE"} # attention type:SEnet、CBAM or NONE
TRAIN = { "DATA_TYPE": "VOC", # DATA_TYPE: VOC ,COCO or Customer "TRAIN_IMG_SIZE": 416, "AUGMENT": True, "BATCH_SIZE": 8, "MULTI_SCALE_TRAIN": False, "IOU_THRESHOLD_LOSS": 0.5, "YOLO_EPOCHS": 200, "Mobilenet_YOLO_EPOCHS": 200, "NUMBER_WORKERS": 4, "MOMENTUM": 0.9, "WEIGHT_DECAY": 0.0005, "LR_INIT": 1e-4, "LR_END": 1e-6, "WARMUP_EPOCHS": 2, # or None 2 }
VAL = { "TEST_IMG_SIZE": 416, "BATCH_SIZE": 8, "NUMBER_WORKERS": 4, "CONF_THRESH": 0.005, "NMS_THRESH": 0.45, "MULTI_SCALE_VAL": True, "FLIP_VAL": True, "Visual": True, } 训练精度,最后为 ===== Validate ===== [2020-12-06 08:53:21,792]-[train.py line:355]:val img size is 416 [2020-12-06 09:01:49,587]-[train.py line:361]:aeroplane --> mAP : 0.8490289925189536 [2020-12-06 09:01:49,587]-[train.py line:361]:bicycle --> mAP : 0.8616897356677506 [2020-12-06 09:01:49,587]-[train.py line:361]:bird --> mAP : 0.7630168848608787 [2020-12-06 09:01:49,587]-[train.py line:361]:boat --> mAP : 0.6620164658721044 [2020-12-06 09:01:49,587]-[train.py line:361]:bottle --> mAP : 0.5173023027163647 [2020-12-06 09:01:49,588]-[train.py line:361]:bus --> mAP : 0.8215678524088977 [2020-12-06 09:01:49,588]-[train.py line:361]:car --> mAP : 0.8627548174990283 [2020-12-06 09:01:49,588]-[train.py line:361]:cat --> mAP : 0.8840138936771906 [2020-12-06 09:01:49,588]-[train.py line:361]:chair --> mAP : 0.554274743873003 [2020-12-06 09:01:49,588]-[train.py line:361]:cow --> mAP : 0.8090681201419012 [2020-12-06 09:01:49,588]-[train.py line:361]:diningtable --> mAP : 0.7523790312660907 [2020-12-06 09:01:49,588]-[train.py line:361]:dog --> mAP : 0.8725108153538809 [2020-12-06 09:01:49,588]-[train.py line:361]:horse --> mAP : 0.8785345702312647 [2020-12-06 09:01:49,588]-[train.py line:361]:motorbike --> mAP : 0.8513608916847762 [2020-12-06 09:01:49,588]-[train.py line:361]:person --> mAP : 0.8176911496220403 [2020-12-06 09:01:49,588]-[train.py line:361]:pottedplant --> mAP : 0.5015688965902171 [2020-12-06 09:01:49,588]-[train.py line:361]:sheep --> mAP : 0.8261363186159126 [2020-12-06 09:01:49,588]-[train.py line:361]:sofa --> mAP : 0.7378369430929343 [2020-12-06 09:01:49,588]-[train.py line:361]:train --> mAP : 0.8731558093496673 [2020-12-06 09:01:49,588]-[train.py line:361]:tvmonitor --> mAP : 0.7680994040000935 [2020-12-06 09:01:49,588]-[train.py line:364]:mAP : 0.7732003819521476 [2020-12-06 09:01:49,588]-[train.py line:366]:inference time: 17.60 ms [2020-12-06 09:01:51,161]-[train.py line:370]:save weights done [2020-12-06 09:01:51,162]-[train.py line:371]: ===test mAP:0.773 [2020-12-06 09:01:51,162]-[train.py line:387]: ===cost time:911.7744s 基本上和你保持一致,但是我在训练过程中,把每一次的val验证替换成了,计算val loss,代码如下:
mloss = torch.zeros(4) for i, ( imgs, label_sbbox, label_mbbox, label_lbbox, sbboxes, mbboxes, lbboxes, ) in enumerate(self.test_dataloader): imgs = imgs.to(self.device) label_sbbox = label_sbbox.to(self.device) label_mbbox = label_mbbox.to(self.device) label_lbbox = label_lbbox.to(self.device) sbboxes = sbboxes.to(self.device) mbboxes = mbboxes.to(self.device) lbboxes = lbboxes.to(self.device)
with torch.no_grad():
p, p_d = self.yolov4(imgs)
loss, loss_ciou, loss_conf, loss_cls = self.criterion(
p,
p_d,
label_sbbox,
label_mbbox,
label_lbbox,
sbboxes,
mbboxes,
lbboxes,
)
loss_items = torch.tensor(
[loss_ciou, loss_conf, loss_cls, loss]
)
mloss = (mloss * i + loss_items) / (i + 1)
通过查看log,发现了在训练最后,train loss已经下降的很低了,但是val loss依旧保持较高的值: 训练集: === Epoch:[200/200],step:[2020/2068],img_size:[416],total_loss:7.5140|loss_ciou:2.3174|loss_conf:2.2943|loss_cls:2.9024|lr:0.000001 === Epoch:[200/200],step:[2030/2068],img_size:[416],total_loss:7.5165|loss_ciou:2.3172|loss_conf:2.2962|loss_cls:2.9031|lr:0.000001 === Epoch:[200/200],step:[2040/2068],img_size:[416],total_loss:7.5094|loss_ciou:2.3161|loss_conf:2.2927|loss_cls:2.9006|lr:0.000001 === Epoch:[200/200],step:[2050/2068],img_size:[416],total_loss:7.5160|loss_ciou:2.3179|loss_conf:2.2947|loss_cls:2.9034|lr:0.000001 === Epoch:[200/200],step:[2060/2068],img_size:[416],total_loss:7.5202|loss_ciou:2.3186|loss_conf:2.2959|loss_cls:2.9056|lr:0.000001 验证集: VAL=== Epoch:[200/200],step:[ 0/618],img_size:[416],total_loss:23.3341|loss_ciou:5.2564|loss_conf:10.6452|loss_cls:7.4325 VAL=== Epoch:[200/200],step:[ 10/618],img_size:[416],total_loss:13.9700|loss_ciou:3.0701|loss_conf:5.9675|loss_cls:4.9324 VAL=== Epoch:[200/200],step:[ 20/618],img_size:[416],total_loss:13.4902|loss_ciou:3.1029|loss_conf:5.7448|loss_cls:4.6425 VAL=== Epoch:[200/200],step:[ 30/618],img_size:[416],total_loss:12.4934|loss_ciou:2.8427|loss_conf:5.2957|loss_cls:4.3550 VAL=== Epoch:[200/200],step:[ 40/618],img_size:[416],total_loss:13.4257|loss_ciou:2.9976|loss_conf:5.7101|loss_cls:4.7180 VAL=== Epoch:[200/200],step:[ 50/618],img_size:[416],total_loss:13.8640|loss_ciou:3.0959|loss_conf:5.8532|loss_cls:4.9149 VAL=== Epoch:[200/200],step:[ 60/618],img_size:[416],total_loss:13.6994|loss_ciou:3.0098|loss_conf:5.8139|loss_cls:4.8757
不知道是否是网络过拟合了
这个0.85到底是mobilenet的还是cspdarknet的
硬件使用AMD1400的CPU、GTX1660S显卡,初始化参数设定如下: 程序并未改动,训练200echo后,结果如下 loss值已经很难降低了,是不是我在使用上存在问题,希望能够得到作者答复。