AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.46k stars 436 forks source link

Promt tunning in COCO can not improve the performance during training (performance in epoch 5 is better than zero-shot, but performance in epoch 35 has no further improvement) #172

Open shupinghu opened 6 months ago

shupinghu commented 6 months ago

Follow the steps in prompt_yolo_world.md to finetune yolo-world-s in coco dataset, the validation map can not improve during the training process. More specifically, the validation map in epoch 5 is 0.388、0.540,and the validation map in epoch 35 is 0.390、0.541.

Since I have already finetuned yolo-world-s in coco dataset using the config file "finetune_coco/yolo_world_s_dual_vlpan_2e-4_80e_8gpus_mask-refine_finetune_coco", for fair comparison, the init embeddings is download from huggingface, and I modified the official config file and get my own file “prompt_tuning_coco/prompt_tuning_coco/yolo_world_s_dual_vlpan_2e-4_80e_8gpus_mask-refine_prompt_tuning_coco.py” to make a promt tunning, the modifications are as follows:

  1. train_batch_size_per_gpu = 32
  2. load_from="/home/shuping.hu/YOLO-World/yolo_world_s_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-18bea4d2.pth"
  3. neck type = "YOLOWorldDualPAFPN"
  4. in box head, use_bn_head=False
  5. in yolo_world/models/backbones/mm_backbone.py, I modified to freeze all the parameters in backbone including change the mode of BN to "eval" mode
wondervictor commented 6 months ago

According to my experiments on YOLO-World-v2-S, the prompt tuning improves from 37.5 AP to 39.9 AP. The zero-shot performance of the S model on COCO is 37.5~37.8, and it seems that the prompt tuning does improve the performance after 5 epochs or 35 epochs.

shupinghu commented 6 months ago

According to my experiments on YOLO-World-v2-S, the prompt tuning improves from 37.5 AP to 39.9 AP. The zero-shot performance of the S model on COCO is 37.5~37.8, and it seems that the prompt tuning does improve the performance after 5 epochs or 35 epochs.

the initial mm_backbone.py does not change the mode of BN to "eval" mode during prompt tuning, so I changed it, maybe this will cause some differences?

shupinghu commented 6 months ago

According to my experiments on YOLO-World-v2-S, the prompt tuning improves from 37.5 AP to 39.9 AP. The zero-shot performance of the S model on COCO is 37.5~37.8, and it seems that the prompt tuning does improve the performance after 5 epochs or 35 epochs.

Yes, compared with the zero-shot performance, the prompt tuning does improve the performance from 37.6 AP to 38.8 AP, but there has no futher improvement when we compared the performance between epoch 5 and epoch 35.

wondervictor commented 6 months ago

mm_backbone has enabled the BN eval: https://github.com/AILab-CVC/YOLO-World/blob/3264b61a03b073852b1559fa896cb12c6ff1aa41/yolo_world/models/backbones/mm_backbone.py#L202

I've evaluated prompt tuning on V2-S (batchsize=16x8m lr=2e-3), the AP in the 5th, 35th, 70th, and 80th are 38.7, 39.3, 39.4, and 39.8, respectively. The last 10 epochs bring AP gains.