A question about the evaluation metrics for image captioning

Hi, I'm interested in the 3D-QA task, and thanks for your paper and code.

After read the paper, I have a question. In Table 13 of the paper, the ROUGE of various tasks is very high, while CIDEr is relatively low. But in Table 8 of the paper, compared to the Table 13, ROUGE is much lower, and CIERr is much higher. Because I am not very familiar with these metrics, so I want to ask you for the reason. Thank you!

ATR-DBI / ScanQA

A question about the evaluation metrics for image captioning #12