DAMO-NLP-SG / VCD

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Apache License 2.0
196 stars 9 forks source link

probability tensor contains either `inf`, `nan` or element < 0 #1

Closed zli999 closed 10 months ago

zli999 commented 10 months ago

We obtain the issue probability tensor contains eitherinf,nanor element < 0 when using cd_sample-True or not.

We find that the output probs is tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0', dtype=torch.float16).

We use the model liuhaotian/llava-v1.5-7b.

Could you please help solve this issue?

hangzhang-nlp commented 10 months ago

Could you please provide more details?

hangzhang-nlp commented 10 months ago

I just gave it a try, and it didn't happen.

zli999 commented 10 months ago

I run the file object_hallucination_vqa_llava.py with COCO dataset. I do not setup use_cd. When the batch size = 1, that problem won't happen. But the results are wired as follows: {"question_id": 1, "prompt": "Is there a snowboard in the image?", "text": "adratkilometer nederb\u00f6rdMult country \u043f\u043b\u043e\u0449\u0430ianodj Det modern Q", "model_id": "llava-v1.5-7b", "image": "COCO_val2014_000000310196.jpg", "metadata": {}} (I only output 10 new tokens.)

But when the batch size > 1, the first question is output as above, but the second question will cause the the problem as follows: next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

It seems not related to inputs / data, just something wrong with codes.

zli999 commented 10 months ago

When I set use_cd = True, the problem happens even in the first question.

next_tokens = torch.multinomial(cd_probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

hangzhang-nlp commented 10 months ago

Firstly, the object_hallucination_vqa_llava.py script we provided only supports a batch size of 1, and we have not provided an interface for a batch size greater than 1.

Secondly, the wried results appear to be due to the LLaVA weights not being loaded correctly, you may want to investigate this issue.

On my end, the output is as expected, {"question_id": 1, "prompt": "Is there a snowboard in the image?", "text": "Yes", "model_id": "llava-v1.5-7b", "image": "COCO_val2014_000000310196.jpg", "metadata": {}}

zli999 commented 10 months ago

So kind of you for the fast reply. Sorry about the incorrect clarification. I just run the code by going through the questions in coco_pope_adversarial.json. When I set use_cd=False, it can only output the answer to the first question, and output the answer like this: {"question_id": 1, "prompt": "Is there a snowboard in the image?", "text": "adratkilometer", "model_id": "llava-v1.5-7b", "image": "COCO_val2014_000000310196.jpg", "metadata": {}}

But the issue happens to the second question: next_tokens = torch.multinomial(cd_probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either `inf`, `nan` or element < 0.

When I set use_cd=True, this issue will directly happen in the first question.

I pull your latest codes and modify the paths of the question file, answer file, and image folder to try again. The issue still happens.

model-path = "liuhaotian/llava-v1.5-7b"

hangzhang-nlp commented 10 months ago

Brother, it must be an error when you load the model weights or generation, because the output is strange when use_cd=False.

zli999 commented 10 months ago

You are right. The issue is caused by the model. I try the larger model "liuhaotian/llava-v1.5-13b", and the issue is fixed.

I download again the "liuhaotian/llava-v1.5-7b", but the issue still exists.

Anyway, thanks for your efforts. Awaresome work!