AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.64k stars 449 forks source link

About fine-tuning the resulting model is unable to inference images #196

Open chenjiafu-George opened 7 months ago

chenjiafu-George commented 7 months ago

First I was using the fine-tuning documentation to fine-tune it myself, Refactoring "yolo_world_v2_s_vlpan_bn_2e-4_80e_8gpus_mask _refine_finetune_coco.pth" from the coco dataset, but using "inference.ipynb" for the fine-tuned model, Modify the code

to test it, and get the following result. ![QQ截图20240328202200](https://github.com/AILab-CVC/YOLO-World/assets/59815166/c9bb197b-5d96-43c6-b9f7-ec77b7147139) ### Try directly using the Hug face download yolo-worldv2 model, yolo_world_v2_s_obj365v1_goldg_pretrain-55b943ea.pth yolo_world_v2_s_vlpan_bn_2e-4_80e_8gpus_mask-refine_finetune_coco_ep80-492dc329.pth The above two weight files are not successfully inferred. but successful inference using yolo_world_s_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-18bea4d2.pth ![QQ截图20240328202657](https://github.com/AILab-CVC/YOLO-World/assets/59815166/76295b86-3b34-4a70-889f-0691992cf3ff)
wondervictor commented 7 months ago

Hi @chenjiafu-George, could you provide your fine-tuning details and the categories for inference?

chenjiafu-George commented 7 months ago

Hi @chenjiafu-George, could you provide your fine-tuning details and the categories for inference?

Hi @wondervictor Thanks for your reply,I only changed the model file path, Don't change the category.

QQ截图20240328204109

2024/03/28 20:38:24 - mmengine - INFO - Epoch(train) [1][2550/3697] base_lr: 2.0000e-04 lr: 4.5965e-05 eta: 3 days, 2:59:06 time: 0.5590 data_time: 0.0453 memory: 10395 grad_norm: 429.6508 loss: 106.0996 loss_cls: 38.6639 loss_bbox: 32.0204 loss_dfl: 35.4153 2024/03/28 20:38:51 - mmengine - INFO - Epoch(train) [1][2600/3697] base_lr: 2.0000e-04 lr: 4.6867e-05 eta: 3 days, 2:23:48 time: 0.5531 data_time: 0.0148 memory: 10102 grad_norm: 466.4150 loss: 105.3860 loss_cls: 38.2321 loss_bbox: 31.7246 loss_dfl: 35.4292 2024/03/28 20:39:18 - mmengine - INFO - Epoch(train) [1][2650/3697] base_lr: 2.0000e-04 lr: 4.7768e-05 eta: 3 days, 1:49:08 time: 0.5457 data_time: 0.0215 memory: 10342 grad_norm: 448.3198 loss: 106.1435 loss_cls: 38.6466 loss_bbox: 31.9562 loss_dfl: 35.5407 2024/03/28 20:39:46 - mmengine - INFO - Epoch(train) [1][2700/3697] base_lr: 2.0000e-04 lr: 4.8670e-05 eta: 3 days, 1:15:49 time: 0.5467 data_time: 0.0496 memory: 10769 grad_norm: 470.3824 loss: 104.9570 loss_cls: 38.0376 loss_bbox: 31.6567 loss_dfl: 35.2627 2024/03/28 20:40:15 - mmengine - INFO - Epoch(train) [1][2750/3697] base_lr: 2.0000e-04 lr: 4.9572e-05 eta: 3 days, 0:47:34 time: 0.5903 data_time: 0.0696 memory: 10209 grad_norm: 460.9792 loss: 105.3659 loss_cls: 38.2831 loss_bbox: 31.7531 loss_dfl: 35.3297 2024/03/28 20:40:42 - mmengine - INFO - Epoch(train) [1][2800/3697] base_lr: 2.0000e-04 lr: 5.0473e-05 eta: 3 days, 0:15:45 time: 0.5380 data_time: 0.0213 memory: 10555 grad_norm: 483.2185 loss: 106.7539 loss_cls: 38.5636 loss_bbox: 32.2759 loss_dfl: 35.9144 2024/03/28 20:41:10 - mmengine - INFO - Epoch(train) [1][2850/3697] base_lr: 2.0000e-04 lr: 5.1375e-05 eta: 2 days, 23:45:55 time: 0.5482 data_time: 0.0049 memory: 10689 grad_norm: 447.7826 loss: 105.9166 loss_cls: 38.7019 loss_bbox: 31.5688 loss_dfl: 35.6458 2024/03/28 20:41:37 - mmengine - INFO - Epoch(train) [1][2900/3697] base_lr: 2.0000e-04 lr: 5.2277e-05 eta: 2 days, 23:17:20 time: 0.5513 data_time: 0.0360 memory: 9489 grad_norm: 421.9359 loss: 106.4364 loss_cls: 38.6151 loss_bbox: 32.1961 loss_dfl: 35.6252

Unfortunately, the operation is too slow due to the computing power problem, so I stopped the fine-tuning process first, I used the inference from your fine-tuned yolo_world_v2_s_vlpan_bn_2e-4_80e_8gpus_mask k-refine_finetune_coco_ep80-492dc329.pth model

chenjiafu-George commented 7 months ago

image

Use this model “yolo_world_v2_s_obj365v1_goldg_pretrain-55b943ea.pth”, inference cannot be performed.

chenjiafu-George commented 7 months ago

Hi @chenjiafu-George, could you provide your fine-tuning details and the categories for inference?

Hi @wondervictor,oh sorry! Due to my negligence, I did not change the path of the cfg file while changing the model weight file, which led to the above problems, which have been solved now, thank you for your reply. At the same time, I have been working on the object detection problem in the field of autonomous driving recently, trying to fine-tune the model, and the fine-tuned model is deployed on the Jetson nano development board. I will continue to launch new columns with follow-up questions. The model I fine-tuned yesterday has trained 20 Epoach so far, in yolo_world_v2_s_vlpan_bn_2e-4_80e_8gpus_mask-refine_finetune_coco_ep80-492dc329.pth Weight file under the coco dataset for fine-tuning, the model accuracy is currently image I changed the information in the configuration file, because my configuration is

NVIDIA TITAN RTX

Driver version: 31.0.15.3734 Driver date: 2023/9/1 DirectX version: 12 (FL 12.1) Physical location: PCI bus 101, device 0, function 0

Utilization rate 0% Dedicated GPU memory 14.0/24.0 GB The shared GPU memory is 0.5/31.9GB GPU memory 14.5/55.9 GB

I used one GPU, not using distributed GPU training, I changed batch_size=16 to batch_size=32.

chenjiafu-George commented 7 months ago

Hi @wondervictor , Now when I fine-tune the model 80 times, the fine-tune log looks like this: image What parameters can be changed in the model part, or what neural network layers can be removed to improve the efficiency of the model? The trained model is 786MB. The Inference effect is OK.

wondervictor commented 7 months ago

Hi @chenjiafu-George, is there any update? Could you let me know if you are satisfied with the fine-tuned results?

For efficiency: (1) you can remove the language model by calling reparameterize in: https://github.com/AILab-CVC/YOLO-World/blob/24c7121cf83c5808efef91c91b76277980feb99e/yolo_world/models/detectors/yolo_world.py#L57

(2) try out fine-tuning with the efficient neck, you can find more details in: https://github.com/AILab-CVC/YOLO-World/blob/master/configs/finetune_coco/yolo_world_l_efficient_neck_2e-4_80e_8gpus_mask-refine_finetune_coco.py

chenjiafu-George commented 7 months ago

Hi @wondervictor ,For your first reply about how to calling it in code,I desperately need it. Thank you! (1)The above fine-tuning results are similar to the results in your paper, but not high enough. I try both of your methods to continue debugging. I will probably do some pruning quantization on the model later, because the size of the model after fine-tuning is just too big for me - 756MB. (2)Today I created the KITTI dataset in coco format (training datasets: 5979 images, validation datasets: 771 images), which is currently used for model training.According to the current speed, we will reply to you tomorrow with the fine-tuning results of this dataset.

chenjiafu-George commented 7 months ago

Hi @wondervictor ,For your first reply about how to calling it in code,I desperately need it. Thank you! (1)The above fine-tuning results are similar to the results in your paper, but not high enough. I try both of your methods to continue debugging. I will probably do some pruning quantization on the model later, because the size of the model after fine-tuning is just too big for me - 756MB. (2)Today I created the KITTI dataset in coco format (training datasets: 5979 images, validation datasets: 771 images), which is currently used for model training.According to the current speed, we will reply to you tomorrow with the fine-tuning results of this dataset.

wondervictor commented 7 months ago

Hi @chenjiafu-George, from previous discussion logs, I'm clear that: (1) you fine-tune YOLO-World-S and obtain 45.8 AP on COCO with mask-refine; (2) the YOLO-World-S consumes about 756MB of memory, which is too large for you.

As for (1), the fine-tuned results are consistent with ours and we are going to further improve the fine-tuned performance of small-size models (compared to YOLOv8, it still has a 1.4 AP gain).

As for (2), maybe you can try out the quantization to convert the FP32 model to FP16 or INT8, which will save much memory cost and increase inference speed. See more about quantization.

Hope to see your update!

chenjiafu-George commented 7 months ago

Hi @wondervictor thank you for your reply. As for (2).I will continue to complete this work, the link you recommended is very helpful to me, thank you.

(1).But for my current dataset, the input image size is (1242,365) and I changed this part.I changed (640,640) to (1242,375) as shown below. ![image](https://github.com/AILab-CVC/YOLO-World/assets/59815166/a96ebe97-83be-43f5-bcee-d0c97d2d60d9) I'm now trying to scale the img_scales you defined up or down to the same scale as my dataset ![image](https://github.com/AILab-CVC/YOLO-World/assets/59815166/9ddf8c97-0f3d-42ea-9366-207ee7d169f5) May I ask if there will be any problems when I do this, or do I need to make a scale transformation when I input the image, from (1242,365) to (640,640)?
wondervictor commented 7 months ago

Hi @chenjiafu-George, maybe you can load the 1280x1280 pre-trained models and modify the fine-tuning resolution to (1280, 384) (the multiples of 32), which better for GPU computation.

chenjiafu-George commented 7 months ago

Hi @chenjiafu-George, maybe you can load the 1280x1280 pre-trained models and modify the fine-tuning resolution to (1280, 384) (the multiples of 32), which better for GPU computation.

Hi @wondervictor Thanks your reply!Let me try it.Thank you, God

chenjiafu-George commented 7 months ago

Hi @wondervictor ,I have now started my custom KITTI dataset (autonomous driving domain) in coco format, but I had some issues during training. image When I made an error in verifying the indicator, I was told that my dataset did not have this metrics, and during training, only loss_cls had values and kept decreasing, while loss_bbox and loss_dfl had no values.

wondervictor commented 7 months ago

Hi @chenjiafu-George, could you share me config?

chenjiafu-George commented 7 months ago

Hi @chenjiafu-George, could you share me config?

Hi @wondervictor . I am so sorry too late for me.Config share for you here. image Configurations that could not be found regarding the validation set metrics are as follows: image

chenjiafu-George commented 7 months ago

config.txt yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.txt This is my modified config file information. @wondervictor

wondervictor commented 7 months ago

Hi @chenjiafu-George, I've checked the config file you provided. The AdamW optimizer is not suitable for fine-tuning on your custom datasets. I strongly suggest you use the latest fine-tuning settings based on the SGD optimizer. You can find more details in: finetune_coco/yolo_world_v2_l_vlpan_bn_sgd_1e-3_40e_8gpus_finetune_coco.py. Specifically, you need to: (1) change the optimizer, learning rate, and weight decay (2) adjust the training epochs (max_epochs) according to the accuracy, as well as the close_mosaic_epochs.

chenjiafu-George commented 7 months ago

Hi @chenjiafu-George, I've checked the config file you provided. The AdamW optimizer is not suitable for fine-tuning on your custom datasets. I strongly suggest you use the latest fine-tuning settings based on the SGD optimizer. You can find more details in: finetune_coco/yolo_world_v2_l_vlpan_bn_sgd_1e-3_40e_8gpus_finetune_coco.py. Specifically, you need to: (1) change the optimizer, learning rate, and weight decay (2) adjust the training epochs (max_epochs) according to the accuracy, as well as the close_mosaic_epochs.

Hi @wondervictor ,thank you for your reply.I will try your proposed method for retraining,I'll get back to you when it's done.

wondervictor commented 6 months ago

Hi @chenjiafu-George, is there any update?

chenjiafu-George commented 6 months ago

Hi @chenjiafu-George, is there any update?

Hi @wondervictor ,Yes, now I changed the learning rate to 2e-7 and optimizer to SGD, but the results are still the same, loss_dfl and loss_bbox are still 0.

image

The only change is that loss_cls drops more slowly.

chenjiafu-George commented 6 months ago

Hi @wondervictor I looked into a few solutions, and problems with this part can be caused by overfitting the model, poor data quality, or mismatching the model configuration to the dataset labels. The data set I used is KITTI(in the field of autonomous driving). Before, someone used KITTI data set for training in yolov3-v5 and succeeded. I also used this data set for training in yolov8. However, the results obtained are almost the same as the current yolo-world-v2, loss_dfl and loss_box are always 0, and loss_cls is dropped again. When yolov8 does not change the parameters (Initialize version), loss_cls is dropped to 0 in the fourth epoches!

wondervictor commented 6 months ago

Hi @chenjiafu-George, (1) have you ever evaluated the zero-shot accuracy? (2) how about the setting (SGD, lr=1e-3, batch size=16)? (3) how about the annotation format, does it follow the coco format, i.e., xywh (left,top)?

chenjiafu-George commented 6 months ago

Hi @wondervictor thanks your reply. for (1),I haven't used zero-shot before, I saw it in your paper, but I haven't evaluated it yet.Where do I start if I want to try zero-shot. for (2),Yes, I tried the configuration parameters you provided for training, and the results are as follows: 1EZHS~H~U)VCUOF%HO2MVC9 for (3),this is my annotation json format. instances_train2017.json

zhangxiaoming0713 commented 6 months ago

Hello, I used your parameters SGD, lr=1e-3, batch size=16 to train the KITTI dataset at present box_loss dfl_loss is still 0 cls_loss but there is no abnormality, may I know if there is any way to solve this?

chenjiafu-George commented 6 months ago

Hi @wondervictor Long time no see, I have solved the previous problem so far, and there are no bugs that coco_metric can't find. I am very satisfied with the results on my dataset. image and the next work is to carry out model quantization and pruning. Now the model is 700MB, and we need to reduce the model to the range of 1-100, do you have any suggestions?

wondervictor commented 5 months ago

Hi @chenjiafu-George, congratulations 🎉! I do have some ideas for both better model performance and deployment:

  1. We've recently released Re-parameterized YOLO-World, you can reparameterize YOLO-World with text embeddings as real model parameters, you can find it at: docs/reparameterize. Re-parameterization provides better fine-tuning performance, simpler architecture, and faster inference speed.
  2. We've released the INT8 Quantization with TFLite, you can refer to docs/tflite_deploy. We have experimented with the TFLite and INT8 quantization for YOLO-World and built real-world applications on mobile phones. INT8 quantization can reduce the model size while keeping the inference accuracy. We can provide more details if you have any questions.
chenjiafu-George commented 5 months ago

Hi @chenjiafu-George, congratulations 🎉! I do have some ideas for both better model performance and deployment:

  1. We've recently released Re-parameterized YOLO-World, you can reparameterize YOLO-World with text embeddings as real model parameters, you can find it at: docs/reparameterize. Re-parameterization provides better fine-tuning performance, simpler architecture, and faster inference speed.
  2. We've released the INT8 Quantization with TFLite, you can refer to docs/tflite_deploy. We have experimented with the TFLite and INT8 quantization for YOLO-World and built real-world applications on mobile phones. INT8 quantization can reduce the model size while keeping the inference accuracy. We can provide more details if you have any questions.

Hi @wondervictor ,thank you for your answers!I will continue to explore in this direction and communicate with you. Thank you for your support!Each of your answers is very helpful to me.

wondervictor commented 5 months ago

Hi @chenjiafu-George, you're welcome. We can keep in touch, and I think this kind of communication is very helpful for both of us to make progress together and also helps us to improve YOLO-World🚀. Thank you very much, and we are very welcome for you to synchronize progress and share any issues you encounter😊!

chenjiafu-George commented 5 months ago

Hi @chenjiafu-George, you're welcome. We can keep in touch, and I think this kind of communication is very helpful for both of us to make progress together and also helps us to improve YOLO-World🚀. Thank you very much, and we are very welcome for you to synchronize progress and share any issues you encounter😊!

Hi @wondervictor , I agree with you very much, and hope that one day yolo-world🚀 can have fewer bugs and make users more convenient. The completeness and multi-modal of yolo-world🚀 can be implemented. I have become a loyal fan of yolo-world🚀 and I will keep maintaining it 😊.Let's fight together!