AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.45k stars 434 forks source link

Perform inference on a video using custom weights #263

Open LLH-Harward opened 5 months ago

LLH-Harward commented 5 months ago

Hello, thank you for your outstanding work! I would like to perform video inference directly using yolo_world, and I have used Roboflow Inference and Supervision, but they only provide some benchmark models, such as l, x, v2-x, v2-l. The model performance of the Yolo World Hugging Face model (https://huggingface.co/spaces/stevengrove/YOLO-World) is better for my purposes than the standard inference one, "yolo_world/v2-x" for example. I would like to use the weights from Hugging Face, such as x-1280, for inference. Could you please provide the necessary support? Or is it possible to directly input videos? I would greatly appreciate it.

wondervictor commented 5 months ago

Could you provide more details/clues about why the HF version (L-640) of YOLO-World is better than the GitHub version (X-1280)? BTW, the HuggingFace demo only uses L-640.

LLH-Harward commented 5 months ago

Thank you for your response. I apologize if I wasn't clear earlier. I meant to point out that on Hugging Face, the models with a 1280 input seem to be more effective at detecting small objects. While Roboflow Inference and Supervision does support video processing, it currently only offers basic models like v2-l, v2-x, and lacks access to other 1280 models. Could you kindly inform me if there's a method to directly use custom weights (for instance, those obtained from training) for video inference?

wondervictor commented 5 months ago

Sure, I saw many requests for inferencing with videos and I'll increase the priority of it. And I'll notify you if I make it, not too long.

LLH-Harward commented 5 months ago

Thank you so much.

wondervictor commented 5 months ago

Hi @LLH-Harward, the latest update has supported video inference. Please check demo/video_demo.py.

LLH-Harward commented 5 months ago

Thank you so much! I'll try it later.

LLH-Harward commented 5 months ago

hello,When I used video_demo.py for inference, the following error occurred, showing that there is no "data/coco/lvis/lvis_v1_minival_inserted_image_name.json" I found the relevant content in the model's configuration file "yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py". How to solve this problem? Can you give relevant guidance? Related configuration: torch+cu118==2.1.1 torchvision+cu118==0.16.1 mmcv==2.0.0rc4 mmdet==3.0.0 mmengine==0.10.3 mmyolo==0.6.0

BUG: inputs: python video_demo.py D:\YOLO-World-master\configs\pretrain\yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py D:\YOLO-World-master\pretrained_weights\ yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth D:\YOLO-World-master\result.mp4 "people,laptop,book,bottle,pen,phone" --out out111

bin C:\Users\714\AppData\Roaming\Python\Python39\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll Loads checkpoint by local backend from path: D:\YOLO-World-master\pretrained_weights\yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth Traceback (most recent call last): File "D:\YOLO-World-master\demo\video_demo.py", line 109, in main() File "D:\YOLO-World-master\demo\video_demo.py", line 56, in main model = init_detector(args.config, args.checkpoint, device=args.device) File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\apis\inference.py", line 97, in init_detector metainfo = DATASETS.build(test_dataset_cfg).metainfo File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "D:\YOLO-World-master\yolo_world\datasets\mm_dataset.py", line 25, in init self.dataset = DATASETS.build(dataset) File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmyolo\datasets\yolov5_coco.py", line 19, in init super().init(*args, *kwargs) File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\datasets\base_det_dataset.py", line 40, in init super().init(args, **kwargs) File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\dataset\base_dataset.py", line 247, in init self.full_init() File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmyolo\datasets\yolov5_coco.py", line 27, in full_init self.data_list = self.load_data_list() File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\datasets\lvis.py", line 605, in load_data_list self.lvis = LVIS(local_path) File "F:\Anaconda\envs\yoloworld\lib\site-packages\lvis\lvis.py", line 27, in init self.dataset = self._load_json(annotation_path) File "F:\Anaconda\envs\yoloworld\lib\site-packages\lvis\lvis.py", line 35, in _load_json with open(path, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: 'data/coco/lvis/lvis_v1_minival_inserted_image_name.json'

LLH-Harward commented 5 months ago

New developments: After I completed the json file and its path required in the configuration file "yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py", and modified for frame in track_iter_progress(video_reader): in video_demo.py to for frame in video_reader:, the code can now run normally and produce results.

However, the running speed is quite slow. Is it because the mmcv framework loads slowly?

wondervictor commented 5 months ago

The visualization takes time to draw objects in frames.

LLH-Harward commented 5 months ago

OK, thank you for your help!