Closed javidsss closed 3 weeks ago
@javidsss It processes each view separately. For ex, if a patient has CC and MLO views, Mammo-CLIP gives two separate embeddings by passing CC and MLO images through its vision encoder. Then, it does contrastive (see the loss in the paper) to learn good representations. There are classification models (e.g, MIRAI) that process both images (CC+MLO) at the same time and get one embedding. Mammo-CLIP does not do that. We clearly explained this through the schematic of our paper.
I see, thanks for the explanation. For the case of evaluating your models performance (let's say for the classifications of the RSNA dataset), how do you use all the images of one subject? Do you get one prediction for each image and then perhaps do some sort of ensembly/averaging?
Yes, based on laterality (left/right) and patient_id. We try to classification for each patient for a particular breast. We follow this from the one of the Kaggle learderboard's code for RSNA dataset (either 6th or 7th I forgot)
Hello! Congrats for the great work! In the paper it is mentioned that there at least needs to be one view of CC/MLO. I was wondering if the model takes multiple views at the same time or it's trained for a single view image?