AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.68k stars 453 forks source link

Reproduction issues with YOLO-World-v2-M pre-training #137

Closed LuletterSoul closed 8 months ago

LuletterSoul commented 8 months ago

Thanks to the authors for open sourcing such a excellent project. When I reproduce YOLO-World-v2-L, the last lvis/bbox_AP in yolo_world_v2_m_o365_goldg_pretrain_part_2.log, is 23.50, but AP_mini in ReadMe is 30.0. Am I understanding this wrong?

2024/01/21 13:57:56 - mmengine - INFO - Epoch(train) [100][2150/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:07:54  time: 0.8242  data_time: 0.0038  memory: 8515  grad_norm: 1690.3185  loss: 1843.1724  loss_cls: 679.4650  loss_bbox: 557.3075  loss_dfl: 606.3999
2024/01/21 13:58:50 - mmengine - INFO - Epoch(train) [100][2200/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:07:10  time: 1.0748  data_time: 0.0038  memory: 8248  grad_norm: 1734.1633  loss: 1838.3821  loss_cls: 680.3904  loss_bbox: 552.5511  loss_dfl: 605.4406
2024/01/21 13:59:37 - mmengine - INFO - Epoch(train) [100][2250/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:06:27  time: 0.9241  data_time: 0.0040  memory: 8568  grad_norm: 1971.2125  loss: 1832.5990  loss_cls: 675.3612  loss_bbox: 552.9561  loss_dfl: 604.2817
2024/01/21 14:00:26 - mmengine - INFO - Epoch(train) [100][2300/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:05:43  time: 0.9976  data_time: 0.0038  memory: 9262  grad_norm: 1854.2161  loss: 1846.6758  loss_cls: 687.0179  loss_bbox: 549.9048  loss_dfl: 609.7531
2024/01/21 14:01:15 - mmengine - INFO - Epoch(train) [100][2350/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:04:59  time: 0.9684  data_time: 0.0038  memory: 8408  grad_norm: 1732.0796  loss: 1813.1792  loss_cls: 665.7357  loss_bbox: 547.8311  loss_dfl: 599.6124
2024/01/21 14:01:50 - mmengine - INFO - Exp name: yolow-v8_m_clipv2_frozen_te_noprompt_t2i_bn_2e-3adamw_scale_lr_wd_32xb16-100e_obj365v1_goldg_train_lviseval_20240120_141543
2024/01/21 14:02:00 - mmengine - INFO - Epoch(train) [100][2400/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:04:16  time: 0.8929  data_time: 0.0040  memory: 8382  grad_norm: 1730.4087  loss: 1862.8332  loss_cls: 684.5326  loss_bbox: 565.8721  loss_dfl: 612.4285
2024/01/21 14:02:53 - mmengine - INFO - Epoch(train) [100][2450/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:03:32  time: 1.0725  data_time: 0.0039  memory: 8528  grad_norm: 1696.9107  loss: 1853.4589  loss_cls: 681.9068  loss_bbox: 557.2568  loss_dfl: 614.2954
2024/01/21 14:03:37 - mmengine - INFO - Epoch(train) [100][2500/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:02:48  time: 0.8742  data_time: 0.0037  memory: 9102  grad_norm: 1684.3628  loss: 1842.8340  loss_cls: 677.3105  loss_bbox: 559.6523  loss_dfl: 605.8711
2024/01/21 14:04:28 - mmengine - INFO - Epoch(train) [100][2550/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:02:05  time: 1.0113  data_time: 0.0038  memory: 8795  grad_norm: 1820.4552  loss: 1840.0748  loss_cls: 675.2696  loss_bbox: 555.3292  loss_dfl: 609.4760
2024/01/21 14:05:19 - mmengine - INFO - Epoch(train) [100][2600/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:01:21  time: 1.0226  data_time: 0.0037  memory: 8462  grad_norm: 1700.2336  loss: 1849.2988  loss_cls: 677.4503  loss_bbox: 561.7094  loss_dfl: 610.1391
2024/01/21 14:06:02 - mmengine - INFO - Epoch(train) [100][2650/2693]  base_lr: 2.0000e-03 lr: 5.9600e-05  eta: 0:00:37  time: 0.8518  data_time: 0.0039  memory: 8542  grad_norm: 1724.9404  loss: 1846.6120  loss_cls: 682.7282  loss_bbox: 560.2276  loss_dfl: 603.6562
2024/01/21 14:06:50 - mmengine - INFO - Exp name: yolow-v8_m_clipv2_frozen_te_noprompt_t2i_bn_2e-3adamw_scale_lr_wd_32xb16-100e_obj365v1_goldg_train_lviseval_20240120_141543
2024/01/21 14:06:50 - mmengine - INFO - Saving checkpoint at 100 epochs
2024/01/21 14:07:18 - mmengine - INFO - Epoch(val) [100][ 50/620]    eta: 0:04:06  time: 0.4321  data_time: 0.0009  memory: 8648  
2024/01/21 14:07:41 - mmengine - INFO - Epoch(val) [100][100/620]    eta: 0:03:52  time: 0.4622  data_time: 0.0004  memory: 1391  
2024/01/21 14:08:04 - mmengine - INFO - Epoch(val) [100][150/620]    eta: 0:03:33  time: 0.4686  data_time: 0.0004  memory: 1391  
2024/01/21 14:08:27 - mmengine - INFO - Epoch(val) [100][200/620]    eta: 0:03:11  time: 0.4651  data_time: 0.0004  memory: 1391  
2024/01/21 14:08:51 - mmengine - INFO - Epoch(val) [100][250/620]    eta: 0:02:49  time: 0.4648  data_time: 0.0004  memory: 1391  
2024/01/21 14:09:13 - mmengine - INFO - Epoch(val) [100][300/620]    eta: 0:02:25  time: 0.4412  data_time: 0.0004  memory: 1391  
2024/01/21 14:09:35 - mmengine - INFO - Epoch(val) [100][350/620]    eta: 0:02:02  time: 0.4520  data_time: 0.0004  memory: 1391  
2024/01/21 14:09:59 - mmengine - INFO - Epoch(val) [100][400/620]    eta: 0:01:40  time: 0.4747  data_time: 0.0004  memory: 1391  
2024/01/21 14:10:22 - mmengine - INFO - Epoch(val) [100][450/620]    eta: 0:01:17  time: 0.4551  data_time: 0.0004  memory: 1391  
2024/01/21 14:10:44 - mmengine - INFO - Epoch(val) [100][500/620]    eta: 0:00:54  time: 0.4443  data_time: 0.0004  memory: 1391  
2024/01/21 14:11:08 - mmengine - INFO - Epoch(val) [100][550/620]    eta: 0:00:32  time: 0.4704  data_time: 0.0004  memory: 1391  
2024/01/21 14:11:29 - mmengine - INFO - Epoch(val) [100][600/620]    eta: 0:00:09  time: 0.4345  data_time: 0.0004  memory: 1391  
2024/01/21 14:13:28 - mmengine - INFO - Evaluating bbox...
2024/01/21 14:21:02 - mmengine - INFO - Epoch(val) [100][620/620]    lvis/bbox_AP: 0.2350  lvis/bbox_AP50: 0.3140  lvis/bbox_AP75: 0.2540  lvis/bbox_APs: 0.1480  lvis/bbox_APm: 0.3230  lvis/bbox_APl: 0.4390  lvis/bbox_APr: 0.1710  lvis/bbox_APc: 0.2000  lvis/bbox_APf: 0.3010  data_time: 0.0005  time: 0.4562
2024/01/21 14:21:02 - mmengine - INFO - The previous best checkpoint /group/30042/adriancheng/FastDet/outputs/pretrain_yolow-v8_m_clipv2_frozen_te_noprompt_t2i_bn_2e-3adamw_scale_lr_wd_32xb16-100e_obj365v1_goldg_train_lviseval/best_lvis_bbox_AP_epoch_99.pth is removed
2024/01/21 14:21:07 - mmengine - INFO - The best checkpoint with 0.2350 lvis/bbox_AP at 100 epoch is saved to best_lvis_bbox_AP_epoch_100.pth.
wondervictor commented 8 months ago

Hi @LuletterSoul, the LVIS AP in pre-training log is the standard LVIS val v1.0 AP, and we report both the LVIS minival (30.0) and LVIS val v1.0 AP (23.5) in the README.

LuletterSoul commented 8 months ago

Hi @LuletterSoul, the LVIS AP in pre-training log is the standard LVIS val v1.0 AP, and we report both the LVIS minival (30.0) and LVIS val v1.0 AP (23.5) in the README.

coco_val_dataset = dict(
    _delete_=True,
    type='MultiModalDataset',
    dataset=dict(type='YOLOv5LVISV1Dataset',
                 data_root='data/coco/',
                 test_mode=True,
                 ann_file='lvis/lvis_v1_minival_inserted_image_name.json',
                 data_prefix=dict(img=''),
                 batch_shapes_cfg=None),

But it seems that the current version of val_dataloader used by the project seems to be lvis v1.0 minival, not lvis v1.0 val? I would like to know if the bbox_AP printed in the yolo_world_v2_m_o365_goldg_pretrain_part_2.log is v1.0 minival or v1.0 val.

wondervictor commented 8 months ago

Hi @LuletterSoul, we pre-train YOLO-World based on our local configs (evaluated on LVIS val v1.0), and release the configs with LVIS minival. If you are confused and you can run an evaluation with the two validation sets.

LuletterSoul commented 8 months ago

Hi @LuletterSoul, we pre-train YOLO-World based on our local configs (evaluated on LVIS val v1.0), and release the configs with LVIS minival. If you are confused and you can run an evaluation with the two validation sets.

Ok, I got it, thanks for your quick response.