训练自己的数据集过程中出错

chaoshengzhe commented 3 years ago

我用yolov5/v3的方法准备数据，训练模型，训练过程中出错了： from n params module arguments
0 -1 1 928 models.common.Conv [3, 32, 3, 1]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 19904 models.common.BottleneckCSP [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 161152 models.common.BottleneckCSP [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 2614016 models.common.BottleneckCSP [256, 256, 15]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 10438144 models.common.BottleneckCSP [512, 512, 15]
9 -1 1 4720640 models.common.Conv [512, 1024, 3, 2]
10 -1 1 20728832 models.common.BottleneckCSP [1024, 1024, 7]
11 -1 1 7610368 models.common.SPPCSP [1024, 512, 1]
12 -1 1 131584 models.common.Conv [512, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 8 1 131584 models.common.Conv [512, 256, 1, 1]
15 [-1, -2] 1 0 models.common.Concat [1]
16 -1 1 2298880 models.common.BottleneckCSP2 [512, 256, 3]
17 -1 1 33024 models.common.Conv [256, 128, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 6 1 33024 models.common.Conv [256, 128, 1, 1]
20 [-1, -2] 1 0 models.common.Concat [1]
21 -1 1 576000 models.common.BottleneckCSP2 [256, 128, 3]
22 -1 1 295424 models.common.Conv [128, 256, 3, 1]
23 -2 1 295424 models.common.Conv [128, 256, 3, 2]
24 [-1, 16] 1 0 models.common.Concat [1]
25 -1 1 2298880 models.common.BottleneckCSP2 [512, 256, 3]
26 -1 1 1180672 models.common.Conv [256, 512, 3, 1]
27 -2 1 1180672 models.common.Conv [256, 512, 3, 2]
28 [-1, 11] 1 0 models.common.Concat [1]
29 -1 1 9185280 models.common.BottleneckCSP2 [1024, 512, 3]
30 -1 1 4720640 models.common.Conv [512, 1024, 3, 1]
31 [22, 26, 30] 1 50260 models.yolo.Detect [2, [[13, 17, 31, 25, 24, 51, 61, 45], [48, 102, 119, 96, 97, 189, 217, 184], [171, 384, 324, 451, 616, 618, 800, 800]], [256, 512, 1024]] Model Summary: 476 layers, 7.0274e+07 parameters, 7.0274e+07 gradients

Transferred 935/943 items from weights/yolov4-p5.pt Optimizer groups: 158 .bias, 163 conv.weight, 155 other Traceback (most recent call last): File "/media/lb/1a80f700-11af-4e7a-ace6-c34917dfdbb9/TEST/yolo/train.py", line 443, in train(hyp, opt, device, tb_writer) File "/media/lb/1a80f700-11af-4e7a-ace6-c34917dfdbb9/TEST/yolo/train.py", line 151, in train world_size=opt.world_size) File "/media/lb/1a80f700-11af-4e7a-ace6-c34917dfdbb9/TEST/yolo/utils/datasets.py", line 60, in create_dataloader pad=pad) File "/media/lb/1a80f700-11af-4e7a-ace6-c34917dfdbb9/TEST/yolo/utils/datasets.py", line 344, in init labels, shapes = zip([cache[x] for x in self.img_files]) File "/media/lb/1a80f700-11af-4e7a-ace6-c34917dfdbb9/TEST/yolo/utils/datasets.py", line 344, in labels, shapes = zip([cache[x] for x in self.img_files]) KeyError: '/media/lb/1a80f700-11af-4e7a-ace6-c34917dfdbb9/TEST/yolo/arm/images/train2017/000000001114.jpg'

定位到代码中在datasets.py的第344行：

  # Get labels
    labels, shapes = zip(*[cache[x] for x in self.img_files])    #344行
    self.shapes = np.array(shapes, dtype=np.float64)
    self.labels = list(labels)

请问这是什么问题造成的？

WongKinYiu commented 3 years ago

試試看把.cache檔刪掉再跑一次

AlphaNext commented 3 years ago

最近同样遇到这样的问题，可能是gt中的txt文件格式不太对。具体可以参考这个链接：yolov5

The *.txt file specifications are:

* One row per object
* Each row is class x_center y_center width height format.
* Box coordinates must be in normalized xywh format (from 0 - 1). If your boxes are in pixels, divide x_center and width by 
  image width, and y_center and height by image height.
* Class numbers are zero-indexed (start from 0).

yolov5-dataset

yolov5-dataset-txt

WongKinYiu / ScaledYOLOv4

训练自己的数据集过程中出错 #33