Multinomial Sampling May Not Ensure the Reliability of the Answer

Dear Author,

Thank you for your inspiring work. I noticed a problem that might be worth a recheck or your clarification.

In your implementation, you used the prompt " Please answer this question with one word." That means that the model will very likely output only a token "yes" or "no".

However, in your sampling method, you used multinomial sampling instead of greedy decoding. (https://github.com/DAMO-NLP-SG/VCD/blob/65a8fd771e9fbb9e26be5633c9d51db99222fbe7/experiments/eval/object_hallucination_vqa_llava.py#L75) This decoding strategy samples the next token via sampling from a multinomial distribution instead of argmax. Therefore, the predicted token (which is simply "yes" or "no") is not deterministic for the same question even the random seed is fixed.

For example, if you run question 7 in POPE random set solely, or run the same question in POPE's order, the model might give a different result.

Thank you.

DAMO-NLP-SG / VCD

Multinomial Sampling May Not Ensure the Reliability of the Answer #10