dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Apache License 2.0
1.75k stars 126 forks source link

New version <LISA-13B-llama2-v1-explanatory> inference VERY SLOWLY #57

Open LWShowTime opened 11 months ago

LWShowTime commented 11 months ago

新版本LISA推理速度太慢了,同样的一个分割命令旧版本几秒钟,新版本需要几十分钟。

X-Lai commented 11 months ago

It works well in my side. Can you give me more detail about this?

LWShowTime commented 11 months ago

image Is this warning about bfloat16 matters? In the last version I remember the inference is fast when segment an image which took only severl seconds. However, the new 13B version inferrence slowly with the GPU 100% working.

my launch parameter : python chat.py --version=/localpath/LISA-13B-llama2-v1-explanatory/ --vision_tower=/localpath/CLIP-vit-large-patch14/ --precision=bf16 --load_in_8bit --image_size=512

And I change several lines because the vision tower in the default config can't be changed by the launch parameter. Here are some changes in the model/llava/model/multimodal_encoder, in the class CLIPVisonTower, from line 19 to around line 30: I set self.cfg_only, self.image_processor, self.vision_tower using a local path string in the from_pretrained function. image

LWShowTime commented 9 months ago

@X-Lai And when I run version 2 in the multi-gpu devices. I got this error: indices should be either on cpu or on the same device as the indexed tensor (cuda:0)