dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Apache License 2.0
1.61k stars 108 forks source link

IndexError: The shape of the mask [1, 304] at index 1 does not match the shape of the indexed tensor [1, 624, 256] at index 1 #119

Open MM-Huang opened 3 months ago

MM-Huang commented 3 months ago

Hi, I run with the code ' CUDA_VISIBLE_DEVICES=0 python app.py --version=LISA-13B-llama2-v1 --load_in_4bit', but got this error, how can I fix them?

Traceback (most recent call last): File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/gradio/queueing.py", line 501, in call_prediction output = await route_utils.call_process_api( File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/gradio/route_utils.py", line 258, in call_process_api output = await app.get_blocks().process_api( File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/gradio/blocks.py", line 1684, in process_api result = await self.call_function( File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/gradio/blocks.py", line 1250, in call_function prediction = await anyio.to_thread.run_sync( File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, args) File "/home/xin/anaconda3/envs/lisa/lib/python3.8/site-packages/gradio/utils.py", line 750, in wrapper response = f(args, **kwargs) File "app.py", line 274, in inference output_ids, pred_masks = model.evaluate( File "/home/xin/PycharmProjects/LISA/model/LISA.py", line 383, in evaluate pred_embeddings = last_hidden_state[seg_token_mask] IndexError: The shape of the mask [1, 304] at index 1 does not match the shape of the indexed tensor [1, 624, 256] at index 1

ShawnHuang497 commented 3 months ago

I met the same error when I trained on reasonseg dataset.

dreamingaa commented 3 months ago

i met the same error . anybody fix it?

pengfeiZhao1993 commented 1 month ago

any suggestion for this problem?

min99830 commented 1 month ago

In my case <image> token is not in prompt.

Make sure your prompt contain <image> token

pengfeiZhao1993 commented 1 month ago

@min99830 ` prompt = input("Please input your prompt: ") prompt = DEFAULT_IMAGE_TOKEN + "\n" + prompt if args.use_mm_start_end: replace_token = ( DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN ) prompt = prompt.replace(DEFAULT_IMAGE_TOKEN, replace_token)

    conv.append_message(conv.roles[0], prompt)
    conv.append_message(conv.roles[1], "")
    prompt = conv.get_prompt() `
Token will be added here. right? and final prompt seems as below, "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: \nPlease segment Lisa in this figure. ASSISTANT:"
pengfeiZhao1993 commented 1 month ago

@min99830 seg_token_mask = output_ids[:, 1:] == self.seg_token_idx # hack for IMAGE_TOKEN_INDEX (we suppose that there is only one image, and it is in the front) seg_token_mask = torch.cat( [ torch.zeros((seg_token_mask.shape[0], 255)).bool().cuda(), seg_token_mask, ], dim=1, )

pengfeiZhao1993 commented 2 weeks ago

I solve this error by replacing vision_tower model from openai/clip-vit-large-patch14-336 to openai/clip-vit-large-patch14