Is this code base reproduceable?

shinpaul14 commented 10 months ago

Hello, based on reading the code and the Miccai 2023 paper. I was wondering if this code is reproducible, where it seems like it is missing several parts of the code.

Amiiney commented 10 months ago

Hi @shinpaul14 Thanks for your feedback, We have incorporated the remaining code into the repository and included training instructions in the README. If you have further questions, feel free to let us know.

Access to the annotations csv (or preprocessing code to generate the csv) will be released shorty. We will provide further updates on this matter within this issue. Best,

UPDATE: The annotations CSV generation instructions is added to the README.

shinpaul14 commented 10 months ago

Hello @Amiiney

Thank you for your help. I have another question, in the current provided code base the results that is reproducible is SwinT and SwinT+SelfD?

shinpaul14 commented 10 months ago

Hello @Amiiney

I have a question about the last linear layer of the model.

`
self.model = timm.create_model(model_name, pretrained=pretrained)

    # Get the number features in final embedding
    n_features = self.model.head.in_features

    # Update the classification layer with our custom target size
    self.model.head = nn.Linear(n_features, CFG.target_size)
    `

In this code when I ran this code I received the error that the model output shape is [64,7,7,131].

Amiiney commented 10 months ago

Hi @shinpaul14,

1- You can reproduce all the experiments expect the ensemble because it needs the phase model. You can reproduce them using the parameters target_size (=100 uses only the triplet information, =131 uses 100 tripelts + 31 individual instrument, verb and target)

SwinT: python main.py target_size=100 epochs=20 distill=false exp=teacher SwinT + MultiT: python main.py target_size=131 epochs=20 distill=false exp=teacher_multi SwinT + SelfD: python main.py target_size=100 epochs=40 distill=true exp=student (You need to generate the softlabels first, see readme). SwinT + MultiT + SelfD: python main.py target_size=131 epochs=40 distill=true exp=student_multi

2- We are importing the Swin transformer from the timm library that has made some modifications to the model in the newer versions. Make sure to downgrade timm to the version mentioned in requirements.txt timm==0.6.5, this should solve the problem.

shinpaul14 commented 10 months ago

Thank you for your help @Amiiney .

Then, how long did it take to train the teacher and student model?

Amiiney commented 10 months ago

The training of the teacher model takes around 15 hours and the student model 30 hours on an RTX3090 GPU

shinpaul14 commented 9 months ago

Thank you for your reply.

With the current updated code, I wasn't able to reproduce the teacher's performance. I ran python main.py target_size=100 epochs=20 distill=false exp=SwinT in this code.

The change I've made was Python 3.7 -> 3.9, a different torch version.

And I have a question about the mAP and CholecT45 mAP and why there performance between them is different??

Amiiney commented 9 months ago

I am suspecting that there was some change in the dataset CholecT45.csv file that shifted the columns, Did you modify the csv file? Can you pull the newest version and parse the CholecT45.csv again.

mAP is the overall score per fold without aggregation, cmAP (challenge mAP) is the aggregation per video that was used in the cholectriplet2022 challenge. Thanks for your feedback, we added this information to the printing.

shinpaul14 commented 9 months ago

When you upload the pre-trained weight can you also upload the training logs?

With the current uploaded code when I try to reproduce the results I get my model overfitted, where the training loss is reduced but validation loss increases.

What could be the possible problem causing overfitting when I try to reproduce?

Amiiney commented 9 months ago

You are correctly reproducing the code! Indeed, the teacher model is overfitting; however, this is not an issue as the weights are saved at the best epoch (in your case: epoch 2) and the main purpose of the teacher model is to generate soft-labels.

After generating the soft labels and training the student model, you should observe a more stable validation loss and increased mAP scores. The behavior of the validation loss is related to the characteristics of the dataset, which includes 100 classes with significant class imbalance. In the Cholectriplet22 challenge, we optimized for the mean average precision metric, not the validation loss. The key difference is that the loss is dominated by the majority classes, while the mAP metric weights all classes equally.

IMSY-DKFZ / self-distilled-swin

Is this code base reproduceable? #2