cvlab-columbia / viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Other
1.63k stars 117 forks source link

"process_guesses" function in Listing 4. OK-VQA example #28

Closed lifan-yuan closed 6 months ago

lifan-yuan commented 1 year ago

Hi, thanks for sharing the code.

When reproducing your results on OK-VQA, I found that the in-context example you used for OK-VQA contains a function named process_guesses, which does not exist in the repo. Could you please provide its implementation?

Thanks a lot!

astanic commented 11 months ago

My assumption would be that the process_guesses function works similarly as with multiple-choice questions answering:

final_answer = LLM(question + 'which of the following choices is the most likely answer?' + [guess1, guess2, ..])

Though it would be nice if the authors could shed some light on this :)

astanic commented 11 months ago

Actually now I see there is another function that is similar to what I described and is also not included in the API listings (or anywhere else in this repository): VideoSegment.select_answers. Either these two functions implement the same functionality or my intuition was wrong.

surisdi commented 6 months ago

Hi, apologies for the late reply. Probably not useful anymore, but just in case: yes, they are similar, but used for different benchmarks. select_answers is used in NextQA and it selects the correct answer among different options provided by the dataset. process_guesses takes as input different guesses made by the model, and chooses the best one given the available information.

We updated the code to add the benchmark code, including the implementation of these functions.