DAMO-NLP-SG / VCD

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Apache License 2.0
196 stars 9 forks source link

model_kwargs_cd and model_kwargs problem #7

Closed Mr-xiu closed 3 months ago

Mr-xiu commented 6 months ago

great work! But I found that model_kwargs_cd is copied from model_kwargs every time in the sample method. Is there some problem? Thanks!

Stevetich commented 5 months ago

I have the same question. I have noticed that the original and distorted image inputs have the same past_key_values setting. I wonder if the past_key_values should be set differently?

frankRenlf commented 4 months ago

I think it does not matter. you can find the process in llava_arch.py -> prepare_inputs_labels_for_multimodal

LengSicong commented 3 months ago

Hi all, thanks for your interest and valuable discussions.

We are sorry for this problem we made while cleaning the open-source codes.

For the solution and more discussions, please refer to issue#13.