Open Iven2132 opened 2 weeks ago
You can refer to our documentation for model evaluation: https://github.com/OpenGVLab/InternVL/blob/main/README.md#documents. Also, if possible, could you provide some examples of errors?
Also getting poor performance here. As an example, when prompted,
Analyze the voting results table image and return a JSON object with this structure: {\"candidates\": [{\"name\": \"Candidate Name\", \"votes\": [{\"polling_station\": number, \"votes\": number}, ...]},...]} Extract votes for each polling station for all candidates. The table may be rotated; use the polling station numbers to determine the correct order of votes.
on the following image,
it returns made-up numbers and candidate names.
I'm running with
pixel_values = load_image('test.png').to(torch.bfloat16).cuda()
generation_config = dict(
num_beams=1,
max_new_tokens=8096,
do_sample=False,
)
Hi, I'm confused I did some visual answering with the InternVL2-26B model and it performs very badly in that. The only model that passes that question are Gemini 1.5 pro/flash, gpt4-o, and Claude.
Then how InternVL2-26B was evaluated? That it outperforms gpt4-v, Gemini 1.5?