Doubts about the evaluation results

chaoyi-wu / RadFM

The official code for "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data".

307 stars 31 forks source link

Doubts about the evaluation results #33

Open Yanllan opened 1 month ago

Yanllan commented 1 month ago

First of all, congratulations on completing such a work, but I am confused about the results in the technical report. The reported results are referred to as "zero-shot evaluation of RadFM", however, the vqa-rad dataset exists in your training dataset, isn't it somewhat paradoxical that? Or did you do something with the dataset or did I misinterpret the paper?

chaoyi-wu commented 1 month ago

Thanks for your question. Here our “zero-shot” setting is more referred as a prompting method, distinguished with few-shot, CoT and so on, following FLAN https://arxiv.org/pdf/2301.13688. In other word, "zero-shot" here is not directly linked with any domain shift.

We talk about the transferring ability on unseen tasks in section "Generalization to Unseen Classes in PadChest" which may be more aligned with your understanding for "zero-shot".