batmanlab / Mammo-CLIP

Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
Creative Commons Attribution 4.0 International
9 stars 0 forks source link

Detection and classification #2

Open emrekeles-arch opened 3 weeks ago

emrekeles-arch commented 3 weeks ago

Hello, there are three questions I want to ask you.

1) What is the purpose of the Breastclip folder? Is it a part of the project? If so, at what stage is it used?

2) I saw a .py file in Breastclip that generates reports from the findings, are these reports obtained from the vindr dataset? Are the reports obtained used in detection and classification tasks, if so, through which files?

3) Which method should I follow if I want to perform detection and birads-classification with the image-text data structure?

Thank you for your work

shantanu-ai commented 3 weeks ago

Hi, Thanks for taking an interest in our repo.

  1. breastclip folder is the directory with all the necessary files to train Mammo-CLIP. Files like the clip model, contrastive loss, data_loader. When we create the project, we initially name it breastclip so it is there as it is. It is part of the project (core part). It is used from the beginning. If you train the Mammo-CLIP from scratch, the train.py file internally calls the breastclip folder for training. If you use our checkpoints for any downstream tasks, both the classification and detection files calls breastclip for setting up the model.
  2. The name of the file is ./src/codebase/augment_text.py. No, the reports are not obtained from VinDr dataset. We obtain the reports from our inhouse UPMC dataset. From VinDr, we use the finding-labels to generate templated-texts in the dataloader dynamically during pre-training of Mammo-CLIP. They dont play any role for classification, detection and zero-shot evaluation. They are only used for pre-training. For classification and detection, if use our checkpoints just follow the steps for classification and detection.
  3. For detection, go to this place. Then go to either Linear probe vision encoder Mammo-CLIP on target detection task for linear probe or Finetune vision encoder Mammo-CLIP on target detection task for finetuning. Linear probing means, the vision encoder of Mammo-clip is fixed. For finetuning, we finetune the vision encoder as well. Please place the checkpoints in the proper folder. Also, we include all the scripts and their utilities here. Follow the detector files.

We did not perform birads-classification, we classify density, mass and calcification for VinDr and cancer for RSNA. However, you can do BiRADS very easily. Follow classifcation steps here. If you use VinDr dataset, make sure, your csv file contains the BiRADS column. The in the --label give BiRADS and run ./src/codebase/train_classifier.py file either with linear probe or finetune mode. If you need to change dataset class, we use MammoDataset function in this file.

emrekeles-arch commented 3 weeks ago

Hi, Thanks for taking an interest in our repo.

  1. breastclip folder is the directory with all the necessary files to train Mammo-CLIP. Files like the clip model, contrastive loss, data_loader. When we create the project, we initially name it breastclip so it is there as it is. It is part of the project (core part). It is used from the beginning. If you train the Mammo-CLIP from scratch, the train.py file internally calls the breastclip folder for training. If you use our checkpoints for any downstream tasks, both the classification and detection files calls breastclip for setting up the model.
  2. The name of the file is ./src/codebase/augment_text.py. No, the reports are not obtained from VinDr dataset. We obtain the reports from our inhouse UPMC dataset. From VinDr, we use the finding-labels to generate templated-texts in the dataloader dynamically during pre-training of Mammo-CLIP. They dont play any role for classification, detection and zero-shot evaluation. They are only used for pre-training. For classification and detection, if use our checkpoints just follow the steps for classification and detection.
  3. For detection, go to this place. Then go to either Linear probe vision encoder Mammo-CLIP on target detection task for linear probe or Finetune vision encoder Mammo-CLIP on target detection task for finetuning. Linear probing means, the vision encoder of Mammo-clip is fixed. For finetuning, we finetune the vision encoder as well. Please place the checkpoints in the proper folder. Also, we include all the scripts and their utilities here. Follow the detector files.

We did not perform birads-classification, we classify density, mass and calcification for VinDr and cancer for RSNA. However, you can do BiRADS very easily. Follow classifcation steps [here]((https://github.com/batmanlab/Mammo-CLIP/tree/main?tab=readme-ov-file#evaluation). If you use VinDr dataset, make sure, your csv file contains the BiRADS column. The in the --label give BiRADS and run ./src/codebase/train_classifier.py file either with linear probe or finetune mode. If you need to change dataset class, we use MammoDataset function in this file.

Thank you so much :)

shantanu-ai commented 3 weeks ago

If you have any questions, do let me know. Happy coding.

emrekeles-arch commented 3 weeks ago

What is the main dataset you use to train Breastclip?

shantanu-ai commented 3 weeks ago

As mentioned in the paper, we use two configurations:

  1. We use inhouse UPMC dataset which contains images+report. We preprocess it to create the csv. Then finally we run ./src/codebase/augment_text.py to create the final csv to train Mammo-CLIP. The instructions are mentioned here

  2. We also pretrain using UPMC (image+text data) and VinDr (image+label). The labels will be converted to templated texts mentioned in the preprocessing steps.

BDW for classification and detection, u dont need to pretrain Mammo-CLIP. Just use our checkpoints to train the classifier or detector using linear probe or finetuning mode.

emrekeles-arch commented 2 weeks ago

src/codebase/breastclip/prompts/prompts.py

What is the function of the python file located in the above folder ?

shantanu-ai commented 2 weeks ago

@emrekeles-arch, this file aims to generate report from the finding-labels of VinDr dataset. In Mammo-CLIP u can use any image-label dataset during the pre-training stage. For image-label dataset, using the labels for Mass, Calcification, distortion etc u need to generate the report sentences first to integrate them in the pre-training. Based on the label and laterality, u get the templated text here. prompts.py uses these texts and the labels to generate sentences for you.

emrekeles-arch commented 2 weeks ago

Did you aim to create a multimodal data structure here?

shantanu-ai commented 2 weeks ago

@emrekeles-arch i create the texts from labels in this function src/codebase/breastclip/prompts/prompts.py. This function is getting called from .src/codebase/breastclip/data/datasets/imagetext.py (line 200)

elif hasattr(self.df, "CC_FINDING"):
            cc, mlo = view_list
            cc_findings = ast.literal_eval(self.df[f"{cc}_FINDING"][index])
            mlo_findings = ast.literal_eval(self.df[f"{mlo}_FINDING"][index])
            text = generate_report_from_labels(cc_findings, self.prompt_json, deterministic=(self.split != "train"))
            text2 = generate_report_from_labels(mlo_findings, self.prompt_json, deterministic=(self.split != "train"))

If you want to pretrain Mammo-CLIP with an image-label data, this function will be invoked while creating the dataset class.

shantanu-ai commented 2 weeks ago

Hi @emrekeles-arch, if your doubts have been clarified, please close this issue. If you have more queries, feel free to ask me.

emrekeles-arch commented 2 weeks ago

Hello again, I want to do a new training using your model's checkpoints. The dataset I have consists only of images, but I want to do a multimodal training. Would it be useful or unnecessary if I created sentences from the information in the csv file using your prompts.py file, matched each sentence with an image and sent it to the training? Since you are an expert, I wanted to get your opinion.

Thank you for your patience and nice replies.

shantanu-ai commented 2 weeks ago

Hi @emrekeles-arch, if you want to further pretrain with your dataset (after initialized mammo-clip with our checkpoints), you can create sentences and add them during the pretraining. However, i think this training has a potential that mammo-clip may forget its knowledge from pre-training with image+text data (which we have from UPMC) because you only have image data. Your texts are templated not real one. We have not done such experiment where we train with image+text and then further pretrain with image + templated text data.

Just to note how we pretrain using UPMC (image+text) and VinDr (only image)? Here are the steps for every epoch:

  1. We first mix both the datasets. Suppose UPMC has 100 samples and VinDr has 50 samples. Now we have 150 samples total.
  2. For each minibatch, we randomly sample images. If the images coming from UPMC (image+text), we use the texts as it is. If it is coming from VinDr (image only), we generate texts from the prompts.py file based on the finding labels of vindr. Suppose in a minibatch size of 16, 10 comes from UPMC and 6 comes from VinDr, we generate texts the 6 from VinDr. Once we have texts ready for all 16, we get embeddings and do contrastive as mentioned in the paper.

Hope it clarifies your question.

emrekeles-arch commented 2 weeks ago

Do you have the opportunity to share the entire inhouse dataset you have?

kayhan-batmanghelich commented 2 weeks ago

The in-house dataset cannot be shared for legal reasons.

Sent from mobile phone. Sorry for misspellings and abbreviations.

On Sat, Jun 15, 2024 at 10:59 AM emrekeles55_ @.***> wrote:

Do you have the opportunity to share the entire inhouse dataset you have?

— Reply to this email directly, view it on GitHub https://github.com/batmanlab/Mammo-CLIP/issues/2#issuecomment-2169845147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC53JXOFKCAZG7HW65LUJZ3ZHRJG5AVCNFSM6AAAAABJBDMBN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRZHA2DKMJUG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

emrekeles-arch commented 2 weeks ago

The in-house dataset cannot be shared for legal reasons. Sent from mobile phone. Sorry for misspellings and abbreviations. On Sat, Jun 15, 2024 at 10:59 AM emrekeles55_ @.> wrote: Do you have the opportunity to share the entire inhouse dataset you have? — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC53JXOFKCAZG7HW65LUJZ3ZHRJG5AVCNFSM6AAAAABJBDMBN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRZHA2DKMJUG4 . You are receiving this because you are subscribed to this thread.Message ID: @.>

I had guessed, I still wanted to take my chances. Thank you for your reply

shantanu-ai commented 2 weeks ago

@emrekeles-arch i have attached a dummy version of the inhouse dataset here. This is the image+text dataset used in pretraining.

The texts are dummy texts or templated texts and the patient-ids are also random numbers. But the structure of the csv file you need to pre-train Mammo-CLIP is the same with the dummy one. We use it to pre-train Mammo-CLIP.

emrekeles-arch commented 1 week ago

@shantanu-ai, How did you get the resized coordinates?

shantanu-ai commented 1 week ago

@emrekeles-arch , i preprocessed it to resize it. The resized co-ordinates will be found here

emrekeles-arch commented 6 days ago

@shantanu-ai, I know I asked a lot from you and you helped me a lot, I am grateful for all of them. I would like to ask one more thing, I have an external dataset and I want to resize its coordinates, can you share with me which technique you used to resize your own dataset?

shantanu-ai commented 5 days ago

@emrekeles-arch , please use this file

emrekeles-arch commented 4 days ago

@shantanu-ai, While training the dataset I have, mAP values ​​are constantly 0.000 in every epoch. The coordinates in the CSV file are as follows: "668.3909,114.3665;668.3909,274.6058;884.4713,274.6058;884.4713,114.3665" these represent the coordinates of 4 different points. From here I subtracted the xmin, ymin, xmax and ymax values ​​and resized them according to the cropping ratio of the image. During labeling, the origin (0,0) point was determined as the center of the image. Can you give me an idea as to why the mAP values ​​are always 0 ?

Additionally, when I run the preprocessing code you presented in my own csv file, even though there is a 'No finding' class, values ​​are assigned to the resized_xmin, resized_ymin... columns as a result of resizing. Normally, these values ​​should be 0,0,0,0. I couldn't understand why.

Ekran görüntüsü 2024-06-26 171238

shantanu-ai commented 4 days ago

@emrekeles-arch , couple of points:

  1. Did u use the last file that i uploaded to adjust the bbox coordinates? During preprocessing, i extracted the breast region from the mammograms. So, the bounding boxes has to be adjusted accordingly. If you just rescale and readjust that won't work here.

  2. I did not include the no_findings labels of VinDr. I only use the samples with mass and calcification in the paper. So, the code here only uses the samples without no-findining labels. You can find i was selecting the first 2000+ rows which have atleast one findings. However, if you want to include them, just set the coordinates as 0. Thats why they are empty for no_findings in this file. I also trained with them, results wont differ much.

emrekeles-arch commented 2 days ago

@shantanu-ai, How do I use the Resnet backbone with Retinanet in object detection task?

shantanu-ai commented 2 days ago

U will find a plethora of examples, here is one which we used in many of our papers: https://github.com/yhenon/pytorch-retinanet

emrekeles-arch commented 1 day ago

@shantanu-ai, When I try to train with FasterRCNN, I get the following warning, I couldn't figure it out. Do you have any ideas on how I can solve it?

targets should not be none when in training mode

shantanu-ai commented 1 day ago

U probably use the targets from the csv as none for No findings. U need to set it to 0, 0, 0, 0. Also, did u try the efficient net and follow the instructions. It is advisable to use our model.

emrekeles-arch commented 1 day ago

@shantanu-ai, Yes, I have tried efficientnet, but I also want to train and compare with different models. Like a swin transformer or a vision transformer.

I don't get a target nan warning when training with Retinanet, but it becomes a problem when I switch to FasterRCNN.

I will try and check again on your suggestion, thank you very much.

shantanu-ai commented 14 hours ago

U dont get the nan for retinanet because i handled it in the code. I did not try fasterRCNN though bcz retinanet is a goto detector for medical images.

emrekeles-arch commented 13 hours ago

Are there any other models that I can use in the backbone part of Retinanet other than Resnet and EfficientNet? Can you make a suggestion?

shantanu-ai commented 13 hours ago

I think any CNN model can be used like Densenet121. For ViTs, u need to search.