AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.67k stars 453 forks source link

关于微调 #49

Closed kkkkkkb closed 8 months ago

kkkkkkb commented 9 months ago

您好,我在调试train.py的过程中,runner = Runner.from_cfg(cfg)会卡住无法向下运行,请问有什么解决办法吗?还有一个问题就是我在自己的数据集(10000张图,只有一类baby)上进行微调时,发现定位loss一直是0,几个epoch后分类loss也降为0,但是我训练了40个epoch后,用image_demo进行测试发现无法检测到目标(图中的目标一个都检测不到),想问下您有没有一些建议。

wondervictor commented 9 months ago

您好,非常抱歉回复较晚。 针对第一个问题runner = Runner.from_cfg(cfg)会卡住无法向下运行,这个请问可以kill的时候知道卡在哪一步吗? 针对第二个问题: 这里可能是标注不一致的问题,可以先确定一下box的标注是不是和coco一致,coco默认用的是xywh(xy是左上角的点)

image
kkkkkkb commented 9 months ago

@wondervictor 您好,非常感谢您的回复! 我发现检测不到目标的原因应该是数据读取的问题,使用coco的json文件时是可以读取到bbox等标注信息的,但是使用自己的json文件时却读不到bbox等标注信息,但是两个json文件的区别好像只是segmentation,希望您有时间可以帮我看一下,十分感谢! coco的json文件: "annotations": [{"segmentation": [[510.66,423.01,511.72,420.03,510.45,416.0,510.34,413.02,510.77,410.26,510.77,407.5,510.34,405.16,511.51,402.83,511.41,400.49,510.24,398.16,509.39,397.31,504.61,399.22,502.17,399.64,500.89,401.66,500.47,402.08,499.09,401.87,495.79,401.98,490.59,401.77,488.79,401.77,485.39,398.58,483.9,397.31,481.56,396.35,478.48,395.93,476.68,396.03,475.4,396.77,473.92,398.79,473.28,399.96,473.49,401.87,474.56,403.47,473.07,405.59,473.39,407.71,476.68,409.41,479.23,409.73,481.56,410.69,480.4,411.85,481.35,414.93,479.86,418.65,477.32,420.03,476.04,422.58,479.02,422.58,480.29,423.01,483.79,419.93,486.66,416.21,490.06,415.57,492.18,416.85,491.65,420.24,492.82,422.9,493.56,424.39,496.43,424.6,498.02,423.01,498.13,421.31,497.07,420.03,497.07,415.15,496.33,414.51,501.1,411.96,502.06,411.32,503.02,415.04,503.33,418.12,501.1,420.24,498.98,421.63,500.47,424.39,505.03,423.32,506.2,421.31,507.69,419.5,506.31,423.32,510.03,423.01,510.45,423.01]],"area": 702.1057499999998,"iscrowd": 0,"image_id": 289343,"bbox": [473.07,395.93,38.65,28.67],"category_id": 18,"id": 1768}, 我的json文件: "annotations":[{"id":0,"image_id":0,"category_id":1,"bbox":[328,176,158.5,403],"area":63875.5,"segmentation":[],"iscrowd":0}, 下面是我尝试分别读取coco数据集和自己数据集的图片信息,发现在读取自己数据集的时候"instance"为空: coco数据集: {'img_path': 'multi-modal/YOLO-World-master/data/coco/train2017/000000522418.jpg', 'img_id': 522418, 'seg_map_path': None, 'height': 480, 'width': 640, 'instances': [{'ignore_flag': 0, 'bbox': [382.48, 0.0, 639.28, 474.31], 'bbox_label': 0, 'mask': [[426.91, 58.24, 434.49, 77.74, 467.0, 80.99, 485.42, 86.41, 493.0, 129.75, 521.17, 128.67, 532.01, 144.92, 545.01, 164.42, 552.6, 170.93, 588.35, 178.51, 629.53, 165.51, 629.53, 177.43, 578.6, 214.27, 558.01, 241.35, 526.59, 329.12, 512.51, 370.29, 502.75, 415.8, 418.24, 409.3, 399.82, 414.72, 388.98, 420.14, 382.48, 424.47, 391.15, 430.97, 414.99, 425.55, 447.49, 427.72, 449.66, 435.3, 431.24, 438.56, 421.49, 452.64, 422.57, 456.98, 432.33, 464.56, 439.91, 458.06, 481.08, 465.64, 502.75, 464.56, 507.09, 473.23, 639.28, 474.31, 639.28, 1.9, 431.24, 0.0]]}, {'ignore_flag': 0, 'bbox': [234.06, 406.61, 454.0, 449.28000000000003], 'bbox_label': 43, 'mask': [[416.41, 449.28, 253.36, 422.87, 234.06, 412.2, 277.23, 406.61, 343.77, 411.69, 379.84, 414.23, 384.41, 424.9, 397.11, 427.95, 410.31, 427.95, 445.36, 429.98, 454.0, 438.61, 431.65, 438.61, 423.01, 449.28]]}, {'ignore_flag': 0, 'bbox': [0.0, 316.04, 406.65, 473.53000000000003], 'bbox_label': 55, 'mask': [[71.19, 327.91, 5.39, 371.06, 0.0, 371.06, 0.0, 473.53, 365.66, 473.53, 379.69, 442.25, 354.88, 431.46, 247.01, 417.44, 232.99, 410.97, 277.21, 406.65, 326.83, 408.81, 379.69, 416.36, 386.16, 418.52, 393.71, 413.12, 406.65, 379.69, 406.65, 366.74, 399.1, 339.78, 286.92, 323.6, 179.06, 318.2, 98.16, 316.04]]}, {'ignore_flag': 0, 'bbox': [305.45, 172.05, 362.81, 249.35000000000002], 'bbox_label': 71, 'mask': [[347.84, 225.66, 311.69, 249.35, 305.45, 205.71, 361.56, 172.05, 362.81, 179.53]]}], 'sample_idx': 1, 'texts': [['person'], ['bicycle'], ['car'], ['motorcycle'], ['airplane'], ['bus'], ['train'], ['truck'], ['boat'], ['traffic light'], ['fire hydrant'], ['stop sign'], ['parking meter'], ['bench'], ['bird'], ['cat'], ['dog'], ['horse'], ['sheep'], ['cow'], ['elephant'], ['bear'], ['zebra'], ['giraffe'], ['backpack'], ['umbrella'], ['handbag'], ['tie'], ['suitcase'], ['frisbee'], ['skis'], ['snowboard'], ['sports ball'], ['kite'], ['baseball bat'], ['baseball glove'], ['skateboard'], ['surfboard'], ['tennis racket'], ['bottle'], ['wine glass'], ['cup'], ['fork'], ['knife'], ['spoon'], ['bowl'], ['banana'], ['apple'], ['sandwich'], ['orange'], ['broccoli'], ['carrot'], ['hot dog'], ['pizza'], ['donut'], ['cake'], ['chair'], ['couch'], ['potted plant'], ['bed'], ['dining table'], ['toilet'], ['tv'], ['laptop'], ['mouse'], ['remote'], ['keyboard'], ['cell phone'], ['microwave'], ['oven'], ['toaster'], ['sink'], ['refrigerator'], ['book'], ['clock'], ['vase'], ['scissors'], ['teddy bear'], ['hair drier'], ['toothbrush']]} 自己数据集: {'img_path': 'multi-modal/YOLO-World-master/data/family/train2017/data01_mp4-6_jpg.rf.872fbf3996d9dfb1bc4b0374ba7a42e8.jpg', 'img_id': 1, 'seg_map_path': None, 'height': 640, 'width': 640, 'instances': [], 'sample_idx': 1, 'texts': [['baby']]}

kkkkkkb commented 9 months ago

补充一下,这里提到的两个配置文件我都进行了尝试,结果还是一样的: image

wondervictor commented 9 months ago

您好,你需要根据 https://github.com/AILab-CVC/YOLO-World/issues/52 去删除mask-refine相关的参数,可能因为没有segmentation会过滤掉box的标签,您这边没有segmentation标注,就直接用下面这个without mask annotations的吧。考虑到大家下游训练没有mask,我们后续提供一些 w/o mask-refine的config样例。

kkkkkkb commented 9 months ago

@wondervictor 您好,非常感谢您的耐心回复! 我搜索了一下self.use_mask_refine,发现只有在两个transform文件中的YOLOv5RandomAffine类中有用到,但是都只是在类的初始化函数init()中用到了,后续并没有其他使用,而且我也根据#52问题将use_mask_refine修改为了False,问题还是没有解决。 image `mosaic_affine_transform = [ dict(type='MultiModalMosaic', img_scale=base.img_scale, pad_val=114.0, pre_transform=base.pre_transform), dict(type='YOLOv5CopyPaste', prob=base.copypaste_prob), dict( type='YOLOv5RandomAffine', max_rotate_degree=0.0, max_shear_degree=0.0, max_aspect_ratio=100., scaling_ratio_range=(1 - base.affine_scale, 1 + base.affine_scale),

img_scale is (width, height)

    border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
    border_val=(114, 114, 114),
    min_area_ratio=_base_.min_area_ratio,
    use_mask_refine=False)`

期待您的w/o mask-refine的config样例!

wondervictor commented 8 months ago

@kkkkkkb 您好,w/o mask-refine的config已经上传了,您可以参考一下 yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py

wondervictor commented 8 months ago

@kkkkkkb, 可以尝试一下在dataset的config里面设置filter_cfg=None

kkkkkkb commented 8 months ago

@wondervictor 辛苦了,我试了下将两个dataset里的filter_cfg一行删掉了,因为是默认为None,但还是读不到标签,instances还是空。 coco_train_dataset = dict( _delete_=True, type='MultiModalDataset', dataset=dict( type='YOLOv5CocoDataset', data_root='multi-modal/YOLO-World-master/data/family', ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/')), class_text_path='multi-modal/YOLO-World-master/data/texts/family_class_texts.json', pipeline=train_pipeline)

wondervictor commented 8 months ago

@kkkkkkb 您好,有可能是因为您现在用的是自己的数据,但是mmyolo/mmdetection默认的cocodataset是包含了coco的类别,可能会导致因两个dataset类别不匹配而跳过了相应的annotation,具体您可以参考下cocodataset实现:

https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/datasets/coco.py#L70

检查一下这里的self.cat_idsself.cat2label是否符合预期,如果不符合预期,我建议您重新仿造CocoDataset构建一个疏于您自己的dataset类。

kkkkkkb commented 8 months ago

@wondervictor 十分感谢您的耐心解答,我终于找到问题了,确实是self.cat_ids那里的问题,一开始self.cat_ids一直为空,导致了读不到标签,thank you!!!

wondervictor commented 8 months ago

@kkkkkkb 不客气,如果您有新的问题,可以重新打开一个issues,感谢您对YOLO-World的关注!