AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.63k stars 447 forks source link

RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x768 and 512x128) #314

Open lijuntao0101 opened 6 months ago

lijuntao0101 commented 6 months ago

[ ] 0/1, elapsed: 0s, ETA:Traceback (most recent call last): File "E:/python_code/YOLO-World-master0510/YOLO-World-master/demo/image_demo.py", line 193, in inference_detector(model, File "E:/python_code/YOLO-World-master0510/YOLO-World-master/demo/image_demo.py", line 86, in inference_detector output = model.test_step(data_batch)[0] File "D:\Anaconda3\envs\pytorch\lib\site-packages\mmengine\model\base_model\base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "D:\Anaconda3\envs\pytorch\lib\site-packages\mmengine\model\base_model\base_model.py", line 361, in _run_forward results = self(data, mode=mode) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\Anaconda3\envs\pytorch\lib\site-packages\mmdet\models\detectors\base.py", line 94, in forward return self.predict(inputs, data_samples) File "E:\python_code\YOLO-World-master0510\YOLO-World-master\yolo_world\models\detectors\yolo_world.py", line 43, in predict img_feats, txt_feats = self.extract_feat(batch_inputs, File "E:\python_code\YOLO-World-master0510\YOLO-World-master\yolo_world\models\detectors\yolo_world.py", line 98, in extract_feat img_feats = self.neck(img_feats, txt_feats) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\python_code\YOLO-World-master0510\YOLO-World-master\yolo_world\models\necks\yolo_world_pafpn.py", line 215, in forward inner_out = self.top_down_layers[len(self.in_channels) - 1 - idx]( File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\python_code\YOLO-World-master0510\YOLO-World-master\yolo_world\models\layers\yolo_bricks.py", line 227, in forward x_main.append(self.attn_block(x_main[-1], guide)) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\python_code\YOLO-World-master0510\YOLO-World-master\yolo_world\models\layers\yolo_bricks.py", line 72, in forward guide = self.guide_fc(guide) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\Anaconda3\envs\pytorch\lib\site-packages\mmcv\cnn\bricks\wrappers.py", line 177, in forward return super().forward(x) File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x768 and 512x128)

I encountered the problem mentioned above while executing the image_demo.py file. What is the reason for this? How to solve it?

wondervictor commented 5 months ago

Hi @lijuntao0101, please check the language model, CLIP-Base (dim=512) or CLIP-Large (dim=768).