few-shot test error - Githubissues

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Other

1.39k stars 114 forks source link

Hi, this project has been a very big help for us.

When I tried to evaluate the llava-1.6-13b model on textVQA with few shots, I encountered this error.

I wonder if this lmms-eval codebase does not provide few shot test properly yet.

command:

accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.6-vicuna-13b" --tasks textvqa --num_fewshot 10 --batch_size 8 --log_samples --log_samples_suffix llava_v1.6_v13b-textvqa --output_path ./logs/fewshot_10

got this error:

Traceback (most recent call last): File "/home/user/lmms-eval/lmms_eval/main.py", line 199, in cli_evaluate results, samples = cli_evaluate_single(args) File "/home/user/lmms-eval/lmms_eval/main.py", line 283, in cli_evaluate_single results = evaluator.simple_evaluate( File "/home/user/lmms-eval/lmms_eval/utils.py", line 448, in _wrapper return fn(*args, kwargs) File "/home/user/lmms-eval/lmms_eval/evaluator.py", line 126, in simple_evaluate results = evaluate( File "/home/user/lmms-eval/lmms_eval/utils.py", line 448, in _wrapper return fn(*args, *kwargs) File "/home/user/lmms-eval/lmms_eval/evaluator.py", line 247, in evaluate task.build_all_requests(limit=limit, rank=lm.rank, world_size=lm.world_size) File "/home/user/lmms-eval/lmms_eval/api/task.py", line 372, in build_all_requests fewshot_ctx = self.fewshot_context(doc_id, 0 if self.config.num_fewshot is None else self.config.num_fewshot, self.config.training_split if self.has_training_docs() else split) File "/home/user/lmms-eval/lmms_eval/utils.py", line 448, in _wrapper return fn(args, kwargs) File "/home/user/lmms-eval/lmms_eval/api/task.py", line 769, in fewshot_context labeled_examples = self.config.description + self.sampler.get_context(doc, num_fewshot) AttributeError: 'textvqa_testConfigurableTask' object has no attribute 'sampler'

Hi, thank you for your interest in our works.

Currently few shot tasks function haven't been added and tested properly in our project so it is very likely lots of issue may appear if you try to use these features.

To enable this feature we might need to reconsider some of the designs in our current code and refractor some parts of the code. As most of our core developers are busying in other projects, we might not be able to release new versions that support few shot testing recently. We definitely will try to include this feature in our future release but it might take some time for use to get the work done. You are also welcome to PR and contribute to this project if you have idea on how to solve this issue.

For current version, if you want to use few shot testing, the only way to do so might be implemented it by yourself in the textvqa utils either hardcode some few shot contexts or random sampling the few shot context.

The idea is like this :

Since you are using llava model, the image token <image> will be automatically filled by the images you pass in. Thus, you need to first revise the doc_to_visual function to add some extra <PIL Image> as few shot image context. Then, you might revise your doc_to_text function so that it can order it in a way to perform few shot context testing.

e.g.

<image>
Q: xxx
A: xxx 
... repeat n times ...

<image>
Q : xxx

As long as you make sure the number of <image> is the same with the number of <PIL Image> you passed in from doc_to_visual contexts, this should work properly for llava. You might also need to prepare the data from textvqa training test by yourself and adding them inside the function.

EvolvingLMMs-Lab / lmms-eval

few-shot test error #11