cvlab-columbia / viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Other
1.65k stars 115 forks source link

Appendix B, listings 2,3,4, and 5? #2

Closed rodrigob closed 1 year ago

rodrigob commented 1 year ago

I am looking for the API examples provided per-dataset as hinted in listings 2,3,4, and 5, of the appendix. From a quick search I could not find them in the codebase. Are they available somewhere?

surisdi commented 1 year ago

Hi, thanks for your interest! We will release benchmark-specific details upon paper acceptance.

rodrigob commented 1 year ago

In the meantime: a) Can you confirm that he prompt includes: the API (with docstring documentation) + dataset specific examples of how to use the API + the image & text question. b) Could you indicate (roughly) how many dataset specific examples are provided in the prompt ?

Thanks for your reply!

surisdi commented 1 year ago

Hi Rodrigo,

a) The prompt for the benchmarks includes the API + dataset-specific query-code example pairs, but it does not contain the image or video (Codex does not take visual input). It also does not contain the ground truth answer.

b) We use around 8 examples per dataset.

rodrigob commented 1 year ago

Got it, thanks for the information. I think sections 3.1 and 3.2 of the paper would benefit from making more explicit the usage of per-dataset query-code pairs beyond the ones included in the docstring. We were quite surprised when we noticed the mention at the very end of the appendix.

surisdi commented 1 year ago

Thanks for the suggestion! The use of examples is mentioned in Section 3 (not only at the end of the appendix), but we agree it could be clearer that dataset-specific optional examples can be included; we follow Flamingo (Appendix A2) [1] in primarily discussing this in the Appendix. We also want to point out that in the wild, these are not necessary — results like those in Figure 1 don’t use any examples.