Open dengandong opened 1 year ago
It is solved by changing the original code to:
args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[1]
No idea why tokenizer("[SEG]")=[29871, 32000]. Only 32000 is the correct [SEG] idx.
It seems that [SEG]
is not added to the vocabulary of the tokenizer successfully. Can you check it by printing out args.seg_token_idx
after this line (https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L129)?
BTW, the issue you reported might be caused by a different version of transformers. Can you re-install the same version as ours?
It seems that
[SEG]
is not added to the vocabulary of the tokenizer successfully. Can you check it by printing outargs.seg_token_idx
after this line (https://github.com/dvlab-research/LISA/blob/main/train_ds.py#L129)?BTW, the issue you reported might be caused by a different version of transformers. Can you re-install the same version as ours? Thanks for your reply!
The printing result is 29871. I also printed: tokenizer("[SEG]", add_special_tokens=False), and this is [29871, 32000].
I'm using transformer==4.33.0, I'll try 4.31.0.
@X-Lai BTW, in utils/dataset.py line 269, why it is return *data[0], inference
but not return *data[idx], inference
?
@X-Lai Also, just to confirm, for the experiments in Table 2, which released model did you use? Did you run all referring segmentation datasets with "https://huggingface.co/xinlai/LISA-7B-v1" or other released model? Cuz I validated the LISA-7B-v1 model to with refCOCOg (UMD) validation set but only yield 63.35 cIoU, which is still lower than the paper results higher than 66.4...
@X-Lai BTW, in utils/dataset.py line 269, why it is
return *data[0], inference
but notreturn *data[idx], inference
?
@dengandong I find that the index passed into getitem is not used, while a random sample drawn from the dataset is used for training instead. I think this data[0]
is designed to work with the fixed steps in one epoch and random data sampling strategies.
File "/home/LISA/model/LISA.py", line 336, in model_forward gt_mask.shape[0] == pred_mask.shape[0] AssertionError: gt_mask.shape: torch.Size([3, 427, 640]), pred_mask.shape: torch.Size([6, 427, 640])
I found the reason that causes this problem is (in train_ds.py, line 129):
args.seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0]
since the token idx for [SEG] should be 29871, 32000. But here only uses 29871.However, the token_idx for is 29871, 32002. This causes the model to output double masks than GT masks.
I get this error by running the following code:
deepspeed --master_port=24999 train_ds.py \ --version=/home/LLaVA/checkpoints/llava-7b-llama-2-7b-chat \ --dataset_dir='dataset/' \ --vision_pretrained=./pretrain_weights/sam_vit_h_4b8939.pth \ --dataset=refer_seg \ --refer_seg_data=refcoco \ --sample_rates=1 \ --conv_type=llava_llama_2 \ --exp_name=lisa-7b \ --load_in_4bit \
The llava model is tested and correct.