How many images/views are used for training/inference

batmanlab / Mammo-CLIP

Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

https://shantanu-ai.github.io/projects/MICCAI-2024-Mammo-CLIP/

Creative Commons Attribution 4.0 International

35 stars 11 forks source link

How many images/views are used for training/inference #19

Closed javidsss closed 3 weeks ago

javidsss commented 3 weeks ago

Hello! Congrats for the great work! In the paper it is mentioned that there at least needs to be one view of CC/MLO. I was wondering if the model takes multiple views at the same time or it's trained for a single view image?

shantanu-ai commented 3 weeks ago

@javidsss It processes each view separately. For ex, if a patient has CC and MLO views, Mammo-CLIP gives two separate embeddings by passing CC and MLO images through its vision encoder. Then, it does contrastive (see the loss in the paper) to learn good representations. There are classification models (e.g, MIRAI) that process both images (CC+MLO) at the same time and get one embedding. Mammo-CLIP does not do that. We clearly explained this through the schematic of our paper.

javidsss commented 3 weeks ago

I see, thanks for the explanation. For the case of evaluating your models performance (let's say for the classifications of the RSNA dataset), how do you use all the images of one subject? Do you get one prediction for each image and then perhaps do some sort of ensembly/averaging?

shantanu-ai commented 3 weeks ago

Yes, based on laterality (left/right) and patient_id. We try to classification for each patient for a particular breast. We follow this from the one of the Kaggle learderboard's code for RSNA dataset (either 6th or 7th I forgot)