Closed youngwanLEE closed 6 months ago
Hi, thank you for your interest in our works.
Currently few shot tasks function haven't been added and tested properly in our project so it is very likely lots of issue may appear if you try to use these features.
To enable this feature we might need to reconsider some of the designs in our current code and refractor some parts of the code. As most of our core developers are busying in other projects, we might not be able to release new versions that support few shot testing recently. We definitely will try to include this feature in our future release but it might take some time for use to get the work done. You are also welcome to PR and contribute to this project if you have idea on how to solve this issue.
For current version, if you want to use few shot testing, the only way to do so might be implemented it by yourself in the textvqa utils either hardcode some few shot contexts or random sampling the few shot context.
The idea is like this :
Since you are using llava model, the image token <image>
will be automatically filled by the images you pass in. Thus, you need to first revise the doc_to_visual
function to add some extra <PIL Image>
as few shot image context. Then, you might revise your doc_to_text
function so that it can order it in a way to perform few shot context testing.
e.g.
<image>
Q: xxx
A: xxx
... repeat n times ...
<image>
Q : xxx
As long as you make sure the number of <image>
is the same with the number of <PIL Image>
you passed in from doc_to_visual contexts, this should work properly for llava. You might also need to prepare the data from textvqa training test by yourself and adding them inside the function.
@kcz358 Thanks for your quick and kind response.
Hi, this project has been a very big help for us.
When I tried to evaluate the llava-1.6-13b model on textVQA with few shots, I encountered this error.
I wonder if this lmms-eval codebase does not provide few shot test properly yet.
command:
got this error: