when to leverage COT for answering

whitesockcat commented 8 months ago

Thank you for the excellent work! After a thorough review of the paper, I have some inquiries:

It's clear that the COT command line is utilized for generating responses to numerical questions involving charts. I noticed that COT is applied within the MathQA dataset as mentioned in your paper. However, for other datasets, was COT employed consistently? How do you determine when to leverage COT for answering, and when to provide direct responses? Specifically, in the evaluation of the ChartQA dataset, was COT used?

I eagerly await your response.

FanqingM commented 8 months ago

The current evaluation of chartqa in the article does not use the instruction template of mathqa. We have tested that using the instruction template of mathqa to ask mathmatical questions in chartqa can continue to improve the accuracy compared with the accuracy in the current version of the paper (we will update the paper later). When evaluating, you can refer to the evaluation code in the repo. I have provided five templates of instruction templates, all of which use the instruction templates in the code for evaluation, while chartqa is placed in the openqa template.

As for when to use the COT form to answer this question, we think it should depend on the user (consistent with think step-by-step when using gpt), such as What is the difference of the x1 and the x2? Of course the user can Use the normal QA template to ask questions, but this question is obviously a mathematical question, so users should be more inclined to use the mathematical template to ask questions,which can obtain more accurate results than normal QA (this conclusion is also verified in the paper)

whitesockcat commented 8 months ago

Thank you for your response.

Regarding the application of the instruction template, could you please clarify whether it was utilized across all questions within the ChartQA dataset to enhance accuracy, or was it specifically employed only for mathematical questions?

FanqingM commented 8 months ago

As I said in my previous answer, the accuracy of chartqa in the current version of the article is all measured using ordinary QA instructions, so the problem you mentioned does not exist;

Secondly, we later changed the mathematical problems in chartqa to mathematical templates (most of the problems in chartqa are element extraction and mathematical problems). This method can continue to improve the accuracy of chartqa based on the current version of the article.

We will update the article later and make the instructions.json for testing chartqa public, but for the current version of the article, this is not needed because the normal qa template is used for all chartqa

whitesockcat commented 8 months ago

Looking forward to your update!!!

whitesockcat commented 8 months ago

Could you please make the instructions.json file available at your earliest convenience?

FanqingM commented 8 months ago

For this version, we just use normal QA template, which is in accessory/single_turn_eval.py, you can refer the issue 6. To generate with batches, you can use accessory/single_turn_eval_multitask.py and refer the issue #6

zhangliang-04 commented 7 months ago

Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot!

FanqingM commented 7 months ago

Hi, i notice that in the updated version, the performance of MathQA is not changed but their is performance increase on ChartQA. It seems you just changed the evaluation process. Could you provide any details about the evaluation of ChartQA in the current version? Thanks a lot!

We change the test instruction of ChartQA, use the instruction template for mathQA in the mathmatical problem in ChartQA, while we just use normal QA template for ChartQA earlier.

OpenGVLab / ChartAst

when to leverage COT for answering #5