GT-Vision-Lab / VQA

Other
363 stars 139 forks source link

Post-processing in VQA 2.0 Evaluation #14

Open igorsterner opened 1 year ago

igorsterner commented 1 year ago

Hello, I am building a VQA system and am seeking clarification for a condition in the vqaEval.py script.

if len(set(gtAnswers)) > 1:
    for ansDic in gts[quesId]['answers']:
        ansDic['answer'] = self.processPunctuation(ansDic['answer'])
        ansDic['answer'] = self.processDigitArticle(ansDic['answer'])
    resAns = self.processPunctuation(resAns)
    resAns = self.processDigitArticle(resAns)

The above condition is placed before running standard post-processing on predictions in line 98. My understanding is that this translates to: 'if all the human annotators agree, don't do any post-processing. However, my system is producing some variation in outputs in these cases, such as 'yes!' rather than 'yes'. Of course I can do my own post-processing, but I was wondering if you might offer some insight into the rationale behind the above condition?

Many thanks!

AishwaryaAgrawal commented 1 year ago

Hi Igor,

My understanding is that this translates to: 'if all the human annotators

agree, don't do any post-processing

Yes, this is correct! And if I remember correctly, the rationale was to not preprocess answers to questions requiring OCR, so questions asking about text written in some part of the image.

Best, Aishwarya

On Wed, Nov 16, 2022 at 7:27 PM Igor Sterner @.***> wrote:

Hello, I am building a VQA system and am seeking clarification for a condition in the vqaEval.py script.

if len(set(gtAnswers)) > 1: for ansDic in gts[quesId]['answers']: ansDic['answer'] = self.processPunctuation(ansDic['answer']) ansDic['answer'] = self.processDigitArticle(ansDic['answer']) resAns = self.processPunctuation(resAns) resAns = self.processDigitArticle(resAns)

The above condition is placed before running standard post-processing on predictions in line 98. My understanding is that this translates to: 'if all the human annotators agree, don't do any post-processing. However, my system is producing some variation in outputs in these cases, such as 'yes!' rather than 'yes'. Of course I can do my own post-processing, but I was wondering if you might offer some insight into the rationale behind the above condition?

Many thanks!

— Reply to this email directly, view it on GitHub https://github.com/GT-Vision-Lab/VQA/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADA4SXOYPEPOMS4UMUYN4GTWIV3YTANCNFSM6AAAAAASCZ33KI . You are receiving this because you are subscribed to this thread.Message ID: @.***>