UCSC-VLAA / MedTrinity-25M

This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
205 stars 15 forks source link

Inconsistent Evaluation Results on Slake1.0 and PathVQA Datasets #9

Open taindp98 opened 2 months ago

taindp98 commented 2 months ago

Hello,

I attempted to replicate the evaluation results presented in the paper using two datasets: Slake1.0 and PathVQA. For this process, I utilized the released data at the provided URL: https://github.com/UCSC-VLAA/MedTrinity-25M/issues/6#issuecomment-2345623211. However, my results do not match those reported in the paper. Below are the details:

  1. Slake1.0 Dataset: It appears that the checkpoint provided is finetuned without pretraining on the MedTrinity-25M dataset, as my results are very close to the results of LLaVA-Med++ (Ours, w/o) in Table 3 of the paper.
  2. PathVQA Dataset: For the Closed set, I was able to replicate the accuracy as expected. However, in the Open set, the recall was significantly lower than the published results. slake1 0_results pathvqa_results

To help diagnose these issues, I have attached two images for reference. Each image corresponds to the evaluation process on the two datasets mentioned above.

Could you kindly verify whether the provided fine-tuning checkpoint for Slake1.0 is correct? Additionally, it would be helpful to understand any specific steps necessary to replicate the reported recall values for the PathVQA Open set.

Thank you in advance for your assistance!

yunfeixie233 commented 1 month ago

Hi @taindp98,

I apologize for any inconvenience. I will review the issues shortly.