CSCYQJ / MICCAI23-ProtoContra-SFDA

This is the official code of MICCAI23 paper "Source-Free Domain Adaptation for Medical Image Segmentation via Prototype-Anchored Feature Alignment and Contrastive Learning"
23 stars 6 forks source link

Source domain training and w/o ada performance #11

Open YYinn opened 8 months ago

YYinn commented 8 months ago

Hi! Thank you for the author's sharing!

I met some problem in training source domain. The model of ct2mr and mr2ct w/o adaptation is 0.68 and 0.49 for me, which is slightly different from the paper result. Could there be any other possible reason causing the results? I followed the instruction of preprocessing in the paper.

Thank you!

YYinn commented 8 months ago

I've got 0.49 as w/o ada for mr2ct and the result of PFA was 0.7153. As i moved on to the CL stage, it went down to 0.1472. As for ct2mr, the result i've got was 0.8262, and the result after CL stage was 0.2548. Could there be any other possible reason causing the results? Thanks!!

BarY7 commented 3 months ago

Hey @YYinn , did you figure it out?

In my case the no-adaptation results seem to also fluctuate. The drop in the CL didn't happen in my case though, but the model's total accuracy is about 82.6 at best.

YYinn commented 3 months ago

Hey @YYinn , did you figure it out?

In my case the no-adaptation results seem to also fluctuate. The drop in the CL didn't happen in my case though, but the model's total accuracy is about 82.6 at best.

hey, @BarY7 , I only achieved result around 0.83 in the direction of CT to MR on the abdominal multi-organ dataset. I think it is related to the hyper-parameters and the performance of the source domain model. If the hyperparameters are carefully adjusted, there may be an improvement in performance.

BarY7 commented 3 months ago

Hey @YYinn , did you figure it out? In my case the no-adaptation results seem to also fluctuate. The drop in the CL didn't happen in my case though, but the model's total accuracy is about 82.6 at best.

hey, @BarY7 , I only achieved result around 0.83 in the direction of CT to MR on the abdominal multi-organ dataset. I think it is related to the hyper-parameters and the performance of the source domain model. If the hyperparameters are carefully adjusted, there may be an improvement in performance.

Thanks for the response It seems like the LR in the uploaded config is different than the paper (10^-3 vs. 10^-4), but changing it didn't help either. I don't think its hyperparams related, as It is weird if the authors uploaded config that doesn't provide similar results to those in the paper. Maybe has something to do with the preprocessing

YYinn commented 3 months ago

Hey @YYinn , did you figure it out? In my case the no-adaptation results seem to also fluctuate. The drop in the CL didn't happen in my case though, but the model's total accuracy is about 82.6 at best.

hey, @BarY7 , I only achieved result around 0.83 in the direction of CT to MR on the abdominal multi-organ dataset. I think it is related to the hyper-parameters and the performance of the source domain model. If the hyperparameters are carefully adjusted, there may be an improvement in performance.

Thanks for the response It seems like the LR in the uploaded config is different than the paper (10^-3 vs. 10^-4), but changing it didn't help either. I don't think its hyperparams related, as It is weird if the authors uploaded config that doesn't provide similar results to those in the paper. Maybe has something to do with the preprocessing

Indeed, if you are using parameters from the paper or code, you may need to maintain the same preprocessing process as the author. Similar issues have been mentioned in other issues, you can take a look to resolve the preprocessing problem.

qwerasdzxcvb commented 3 months ago

I've got 0.49 as w/o ada for mr2ct and the result of PFA was 0.7153. As i moved on to the CL stage, it went down to 0.1472. As for ct2mr, the result i've got was 0.8262, and the result after CL stage was 0.2548. Could there be any other possible reason causing the results? Thanks!! 请问你有解决这个问题吗,是调整了什么超参数吗,目前我mrtoct方向的PFA阶段只有0.76,在cttomr方向的PFA阶段有0.83 我的数据预处理也是用的作者的

YYinn commented 3 months ago

I've got 0.49 as w/o ada for mr2ct and the result of PFA was 0.7153. As i moved on to the CL stage, it went down to 0.1472. As for ct2mr, the result i've got was 0.8262, and the result after CL stage was 0.2548. Could there be any other possible reason causing the results? Thanks!! 请问你有解决这个问题吗,是调整了什么超参数吗,目前我mrtoct方向的PFA阶段只有0.76,在cttomr方向的PFA阶段有0.83 我的数据预处理也是用的作者的

@qwerasdzxcvb @BarY7 Hello, Apart from the influence of preprocessing, I previously adjusted the nav_t parameter, which might have helped to improve the results. But the final results I got were also similar, as 0.8382 and 0.7544, respectively. A influencing factor I thought is the "crop out the non-body region" operation mentioned in the paper. It seems that there is no code for this operation on GitHub. If larger areas are manually cropped (e.g. 20 for each side), the results might improve.

BarY7 commented 3 months ago

@YYinn Did you run PFA on the best model achieved in the source training phase? In my case, the PFA results are very unstable, sometimes they crash the dice to 60%.

Regarding the preprocessing - there is indeed a notebook the authors uploaded with the pre-processing code, although it is a bit messy (some things had to be uncommented). It can be found in the commit history (it was deleted)

"crop out the non-body region" - For MRI the voxels are thresholded with value 100 and everything below that is considered "non-body" and cropped out. For ct images is done by the function delete_bed_torch - it is uploaded alongside the notebook.

BarY7 commented 3 months ago

In the CT -> MR direction I did manage to get one model to 87.5% after the CL stage, but when trying a slightly different base model (from a few epochs back for the source model, with about the same results) I get 60% after PFA. I don't understand why the process is so unstable

YYinn commented 3 months ago

@BarY7 Thank you for your explanation! I indeed didn't see this preprocessing code before, I just referred to the reference and author's instructions, and didn't see any related code and functions.

In the CT to MR direction experiment, I also had a result of 0.87 before, but at that time I cropped 10 pixels from the top, bottom, left, and right of the image (to perform the "crop out the non-body region" operation). If I didn't use this operation and didn't modify the hyper-parameters, the result was only 0.83. (Has the delete_bed_torch function been deleted?)

I used the best model for PFA stage training because I saw that this was also done in the config file of the code. I haven't tried using the results of other models, but mainly adjusted nav_t. While during the experiment, I also found that the fluctuation of the model's results was greatly affected by other parameters.

I have asked in another issue that why my results are different from those in the paper, and the author said that the results of the PFA stage are greatly influenced by the source domain model performance. However, if you are using an source domain model with similar results to the best one, I indeed cannot provide an answer. The model might be quite sensitive to hyper-parameters. Besides, I think it might because of the differences in the preprocessing stage. But without a clear version of the code for preprocessing, it's hard to verify the conjecture.

qwerasdzxcvb commented 3 months ago

@BarY7 @YYinn Thanks for the answer! I don't think I have "crop out the non-body region", I'll try again!

BarY7 commented 3 months ago

@YYinn I don't think cropping 10px of each side should be enough, the CT images are very weird sometimes with centering, see the example.

Screenshot 2024-05-23 155119

delete_bed_torch is in another file not in the notebook, previously I did something similar before I found this function using skimage label function

Why did you choose nav_t for adjusting? What is the range you tried with?

With regards to the chosen source model, the source models have similar scores on the source domain, but their result on the target domain varies. I wonder if the authors chose a model with naturally good performance on the target or simply the best one.

YYinn commented 3 months ago

@BarY7 Previously, since there were no specific parameters for cropping, I mainly removed the blank background area (non-body region), so I set the crop value to 10.

nav_t is a major hyperparameter in ProtoLoss, and I noticed that there are some scale differences in the t2p and p2t loss functions, so I tried adjusting the nav_t. The adjustments were mainly made between 0.1 and 1.5.

Regarding the choice of the source domain model in the PFA stage, I think your point makes sense.

BarY7 commented 3 months ago

@YYinn I see, I will look into updating nav_t

@YYinn @qwerasdzxcvb I do not have access to baidu, but it might be a good idea to download the uploaded model from here: https://github.com/CSCYQJ/MICCAI23-ProtoContra-SFDA/issues/15

And check its performance on the preprocessed data

YYinn commented 3 months ago

@BarY7 Thanks! I'll try it. My 'w/o adaptation' results are indeed lower than that in the paper.

YYinn commented 3 months ago

@BarY7

@YYinn I see, I will look into updating nav_t

@YYinn @qwerasdzxcvb I do not have access to baidu, but it might be a good idea to download the uploaded model from here: #15

And check its performance on the preprocessed data

If you need the checkpoint, I can send it to your email.

BarY7 commented 3 months ago

@YYinn Yeah, It would be great!

Please send it to renaiseth at gmail.com, thanks!

YYinn commented 3 months ago

@BarY7 I have already sent the email, see if you have received it!

BarY7 commented 3 months ago

@YYinn Got it, thank you 🥇