AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.4k stars 426 forks source link

checkpoint file for prompt tune #145

Closed taofuyu closed 6 months ago

taofuyu commented 6 months ago

In config yolo_world_v2_l_vlpan_bn_2e-4_80e_8gpus_mask-refine_prompt_tuning_coco.py, it loads from yolo_world_l_clip_t2i_bn_2e-3adamw_32xb16-100e_obj365v1_goldg_cc3mlite_train-ca93cd1f.pth, where can I download this file ? Thanks. BTW, looking forward to your deployment tools.

wondervictor commented 6 months ago

Hi @taofuyu, you can use this checkpoint in the released models to replace this file, it's a neglected typo (the file is named like that in the local directory).

taofuyu commented 6 months ago

Thanks. And how to generate the text embeddings files for dataset GQA/Flicker if I want to train YOLO-World with Embeddings from scratch, not tuning ?

wondervictor commented 6 months ago

I'll update the file for generating embedding, it was missed at the last update. However, using text embeddings to train GQA or Flickr is not suggested.

taofuyu commented 6 months ago

So, YOLO-World with Embeddings is only for finetuing ?

wondervictor commented 6 months ago

It's used for (1) fine-tuning on custom datasets (prompt tuning) without losing the zero-shot ability, (2) image prompts, (3) CLIP adapters, and (4) easy deployment.

taofuyu commented 6 months ago

the file for generating embedding

looking forward to your update the file for generating embedding

wondervictor commented 6 months ago

@taofuyu, I added a simple tool to generate text embeddings from text json in tools/generate_text_prompts.py. For image embeddings, I'll add it in the next commit.

taofuyu commented 6 months ago

@wondervictor Thanks for timely update. I am confused about this script: By default, It uses 'data/captions/coco_class_captions.json', but I can't find this file in the repo. No folder 'captions' in 'data'. Maybe you can upload this file and we can use it as an example for our custom dataset.

wondervictor commented 6 months ago

Hi @taofuyu, it's the same with data/texts/coco_class_texts.json.

lin-whale commented 5 months ago

Hi @taofuyu, you can use this checkpoint in the released models to replace this file, it's a neglected typo (the file is named like that in the local directory).

hello, I just find the text_model_name in the location you mentioned but not pth file likeyolo_world_l_clip_t2i_bn_2e-3adamw_32xb16-100e_obj365v1_goldg_cc3mlite_train-ca93cd1f.pth. https://github.com/AILab-CVC/YOLO-World/blob/da0fcb0ccf825e5fb9423651b11dfaac908f9249/configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py#L19 So just leave the "load_from" papameter in config file empty? And only specify the "text_model_name" parameter?

taofuyu commented 5 months ago

If you are fine-tuning, you should "load_from" pre-trained weights, it is a pth file. But "text_model_name" is a repo in huggingface, you can download this repo to your hard disk and then specify the path on your local host, otherwise the code will connect huggingface during trainning.

lin-whale commented 5 months ago

If you are fine-tuning, you should "load_from" pre-trained weights, it is a pth file. But "text_model_name" is a repo in huggingface, you can download this repo to your hard disk and then specify the path on your local host, otherwise the code will connect huggingface during trainning.

Thanks. So where to get yolo_world_l_clip_t2i_bn_2e-3adamw_32xb16-100e_obj365v1_goldg_cc3mlite_train-ca93cd1f.pth file?

taofuyu commented 5 months ago

They release all weights in model zoo

lin-whale commented 5 months ago

Got it! I was in the wrong branch in hugging-face repo so didn't see all checkpoints. Thanks for quick reply!

wenqiuL commented 1 month ago

@taofuyu Hello, I have generated a custom class text embedding according to the requirements, but the loss during testing is relatively large and an error is reported directly during verification. Can you please provide the content of the config file “yolo_world_v2_l_vlpan_bn_2e-4_80e_8gpus_mask-refine_prompt_tuning_coco.py ”for prompt that you have modified?