AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.4k stars 426 forks source link

about running image_demo.py with YOLO-Worldv2-XL #213

Open oliacode opened 5 months ago

oliacode commented 5 months ago

Thank you for sharing your results and congratulations on the excellent work you are doing with YOLO-World. I'm trying to execute locally this demo using config YOLO-Worldv2-XL:

python3 image_demo.py configs/pretrain/yolo_world_v2_xl_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py pretrained_weights/yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain-8698fbfa.pth data/images 'person,bus' --topk 100 --threshold 0.005 --output-dir data/demo_outputs

I downloaded the corresponding pretrained weights but I'm getting this problem:

Traceback (most recent call last): File "/home/me/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file resolved_file = hf_hub_download( File "/home/me/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/home/me/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '../pretrained_models/clip-vit-base-patch32-projection'. Userepo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/me/my_yolo/YOLO-World/image_demo.py", line 163, in runner = Runner.from_cfg(cfg) File "/home/me/.local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfg runner = cls( File "/home/me/.local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in init self.model = self.build_model(model) File "/home/me/.local/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_model model = MODELS.build(model) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/me/my_yolo/YOLO-World/yolo_world/models/detectors/yolo_world.py", line 24, in init super().init(args, kwargs) File "/home/me/anaconda3/envs/my_yolo/lib/python3.10/site-packages/mmyolo/models/detectors/yolo_detector.py", line 41, in init super().init( File "/home/me/anaconda3/envs/my_yolo/lib/python3.10/site-packages/mmdet/models/detectors/single_stage.py", line 30, in init self.backbone = MODELS.build(backbone) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/me/my_yolo/YOLO-World/yolo_world/models/backbones/mm_backbone.py", line 190, in init self.text_model = MODELS.build(text_model) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/me/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/home/me/my_yolo/YOLO-World/yolo_world/models/backbones/mm_backbone.py", line 73, in init self.tokenizer = AutoTokenizer.from_pretrained(model_name) File "/home/me/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 767, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, kwargs) File "/home/me/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 600, in get_tokenizer_config resolved_config_file = cached_file( File "/home/me/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 462, in cached_file raise EnvironmentError( OSError: Incorrect path_or_model_id: '../pretrained_models/clip-vit-base-patch32-projection'. Please provide either the path to a local folder or the repo_id of a model on the Hub.`

wondervictor commented 5 months ago

change ../pretrained_models/clip-vit-base-patch32-projection to openai/clip-vit-base-patch32.

Xiaofei-Kevin-Yang commented 5 months ago

Thanks for your great work. I am new to deep learning. Could you please provide a more detailed revision?

Xiaofei-Kevin-Yang commented 5 months ago

Thanks for your great work. I am new to deep learning. Could you please provide a more detailed revision?

change ../pretrained_models/clip-vit-base-patch32-projection to openai/clip-vit-base-patch32.

When I do the change, I got another error "OSError: class YOLOWorldDetector in yolo_world/models/detectors/yolo_world.py: class MultiModalYOLOBackbone in yolo_world/models/backbones/mm_backbone.py: class HuggingCLIPLanguageBackbone in yolo_world/models/backbones/mm_backbone.py: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like openai/clip-vit-base-patch32 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'."

ljyan93 commented 5 months ago

hey @Xiaofei-Kevin-Yang

you need the text encoder (which is clip model) locally after this change.

  1. first of all, install git lfs curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt-get install git-lfs

  2. under your local folder 'pretrained', run: git lfs install

  3. git clone https://huggingface.co/openai/clip-vit-large-patch14-336

now you should have the model and related files you needed.

(by the way if git clone can't help downloading the 1.x GB model file, you can just manually click the download button in the hugging face page, and then move to the folder.)

wondervictor commented 5 months ago

You can directly access the link to download

lijuntao0101 commented 4 months ago

Hello, I have encountered the same problem as you. Could you please explain in detail how you solved this problem.