UCSC-VLAA / MedTrinity-25M

This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
164 stars 15 forks source link

Clarify settings of LLAVA-Med++ #5

Closed duyhominhnguyen closed 3 days ago

duyhominhnguyen commented 1 week ago

Dear Authors,

Thanks for your inspiring work! I have a question about settings of LLaVA-Med++(Ours, w/o) and LLaVA-Med++(Ours, w/).

While I understand that LLaVA-Med++(Ours, w/o) means you don't pre-trained on MedTrinity but what does this sentence mean we further pretrained our model on the corresponding MedTrinity-25M subset to achieve multigranular alignment.

Also, in the model zoon in Github, e.g. on VQA-RAD, you mentioned Pretrained on LLaVA-Med Data and MedTrinity-25M (specifically the VQA-RAD training set subset), finetuning on VQA-RAD training set so what do you mean of specifically the VQA-RAD training set subset.

Thank you a lot, and I look forward to hearing your feedback soon.

yunfeixie233 commented 1 week ago

Dear duynhm,

Thank you for your interest in our work and your questions about LLaVA-Med++ settings.

To clarify, MedTrinity-25M is a composite dataset that includes various medical datasets with generated captions and ROI annotations. The "VQA-RAD training set subset" refers to a portion of MedTrinity-25M originating from VQA-RAD, but with newly generated captions and added ROIs.

We appreciate your engagement and will strive to provide more detailed explanations in future versions. If you have any more questions, please don't hesitate to ask.

Thank you for your attention to our research.

duyhominhnguyen commented 4 days ago

@yunfeixie233 great!, I got it now. Thank you.