Thanks for your interesting work and for sharing the code.
In the README, you only provide examples of how to generate captions for one image at a time (batch size = 1). Could you (@Yushi-Hu) explain how to generate captions in batches (multiple questions and corresponding images) in one go, instead of iteratively calling the model to improve time efficiency?
For our paper we just do inference 1-by-1. Currently we need to change the codebase to support batch inference. We will update you as soon as batch inference is supported!
Thanks for your interesting work and for sharing the code.
In the README, you only provide examples of how to generate captions for one image at a time (batch size = 1). Could you (@Yushi-Hu) explain how to generate captions in batches (multiple questions and corresponding images) in one go, instead of iteratively calling the model to improve time efficiency?