dvlab-research / LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Apache License 2.0
1.8k stars 128 forks source link

The number of pred_mask is inconsistent with the number of gt_mask #55

Open dengandong opened 1 year ago

dengandong commented 1 year ago

File "/home/LISA/model/LISA.py", line 336, in model_forward gt_mask.shape[0] == pred_mask.shape[0] AssertionError: gt_mask.shape: torch.Size([3, 427, 640]), pred_mask.shape: torch.Size([6, 427, 640])

I found the reason that causes this problem is (in train_ds.py, line 129): args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0] since the token idx for [SEG] should be 29871, 32000. But here only uses 29871.

However, the token_idx for is 29871, 32002. This causes the model to output double masks than GT masks.

I get this error by running the following code: deepspeed --master_port=24999 train_ds.py \ --version=/home/LLaVA/checkpoints/llava-7b-llama-2-7b-chat \ --dataset_dir='dataset/' \ --vision_pretrained=./pretrain_weights/sam_vit_h_4b8939.pth \ --dataset=refer_seg \ --refer_seg_data=refcoco \ --sample_rates=1 \ --conv_type=llava_llama_2 \ --exp_name=lisa-7b \ --load_in_4bit \

The llava model is tested and correct.

dengandong commented 1 year ago

It is solved by changing the original code to:

args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[1] No idea why tokenizer("[SEG]")=[29871, 32000]. Only 32000 is the correct [SEG] idx.

X-Lai commented 1 year ago

It seems that [SEG] is not added to the vocabulary of the tokenizer successfully. Can you check it by printing out args.seg_token_idx after this line (https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L129)?

BTW, the issue you reported might be caused by a different version of transformers. Can you re-install the same version as ours?

dengandong commented 1 year ago

It seems that [SEG] is not added to the vocabulary of the tokenizer successfully. Can you check it by printing out args.seg_token_idx after this line (https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L129)?

BTW, the issue you reported might be caused by a different version of transformers. Can you re-install the same version as ours? Thanks for your reply!

The printing result is 29871. I also printed: tokenizer("[SEG]", add_special_tokens=False), and this is [29871, 32000].

I'm using transformer==4.33.0, I'll try 4.31.0.

dengandong commented 1 year ago

@X-Lai BTW, in utils/dataset.py line 269, why it is return *data[0], inference but not return *data[idx], inference ?

tsunghan-wu commented 1 year ago

@X-Lai Also, just to confirm, for the experiments in Table 2, which released model did you use? Did you run all referring segmentation datasets with "https://huggingface.co/xinlai/LISA-7B-v1" or other released model? Cuz I validated the LISA-7B-v1 model to with refCOCOg (UMD) validation set but only yield 63.35 cIoU, which is still lower than the paper results higher than 66.4...

Vladimir2506 commented 1 year ago

@X-Lai BTW, in utils/dataset.py line 269, why it is return *data[0], inference but not return *data[idx], inference ?

@dengandong I find that the index passed into getitem is not used, while a random sample drawn from the dataset is used for training instead. I think this data[0] is designed to work with the fixed steps in one epoch and random data sampling strategies.