The trainning problem - Githubissues

cwmok / DIRAC

This is the official Pytorch implementation of "Unsupervised Deformable Image Registration with Absent Correspondences in Pre-operative and Post-Recurrence Brain Tumor MRI Scans" (MICCAI 2022), written by Tony C. W. Mok and Albert C. S. Chung.

MIT License

36 stars 1 forks source link

The trainning problem #2

Closed haowang020110 closed 1 year ago

haowang020110 commented 1 year ago

Dear author,hello.I have some questions about trainning stage.Different your previous job——CLapIRN,this project's code don't have process of trainning Lv1 and Lv2 model.And did you use the mask function during trainning lv1 and lv2 model?

cwmok commented 1 year ago

Hi @wanghao-cv,

For cLapIRN and DIRAC, we adopted the same end-to-end training scheme (i.e., training without lvl1 and lvl2) in our experiments.

haowang020110 commented 1 year ago

Sorry i am still unclear about your response, I saw cLapIRN's train.py has the process of traing lv1 and lv2. And lv3model's traing are based on the pretrain model from lv1 and lv2 ,why you say cLapIRN is an end-to-end traing?

cwmok commented 1 year ago

Hi @wanghao-cv,

The progressive training scheme in the original LapIRN and cLapIRN is used to stabilize the training. In many cases, we can directly train the model in an end-to-end manner (without the progressive training scheme). Of course, you can also train DIRAC with the progressive training scheme. You will observe a much faster training time but there is no significant improvement using the progressive training scheme in this task.

We didn't encounter any instability in training our model for this task. Therefore, for brevity, we bypass the progressive training scheme in our experiments (for this paper "DIRAC").

haowang020110 commented 1 year ago

Thanks for response! I used BraTs's data on your code yesterday,but the results of tre is not well. The score of tre rise to 130 after 38000iterations(158epochs).Can I get some more details about data grouping or some initial setup

cwmok commented 1 year ago

Hi @wanghao-cv,

The train/valid/test splits we used are as follows: Group 1: Cases 029-130 / Cases 131-140 / Cases 001-028 Group 2: Cases 001-028 + 057-130 / Cases 131-140 / Cases 029-056 Group 3: Cases 001-056 + 085-130 / Cases 131-140 / Cases 057-084 Group 4: Cases 001-084 + 113-130 / Cases 131-140 / Cases 085-112 Group 5: Cases 001-102 / Cases 103-112 / Cases 113-140

For each training split, we also include the pre-defined validation dataset (Cases 141-160) in each of the "train" split. Note that the labels of the pre-defined validation dataset are not publicly available and therefore we cannot use it in "valid" split.

cwmok commented 1 year ago

I tested the code with PyTorch 1.9.0+cu111 and Python 3.7. Also, I updated the bratsreg_model_stage.py. Could you try it again with the updated files?

haowang020110 commented 1 year ago

OKay,I will try it again with the updated files and use Group1 data.Thanks

haowang020110 commented 1 year ago

Hello,I run the BRATS_train_DIRAC.py again with updated files.My trainning data is divided into Group1 and other parameters are keeping the code's default value.But the Tre loss is still not will, the log is followed 0:12.499654325317982 2000:11.554369833200962 4000:11.7596705412202 6000:8.858232847211195 8000:12.326368763491153 10000:47.827801819789634 12000:15.043822223587409 14000:13.135572823672486 16000:16.81639033939812 18000:12.806663615119465 20000:88.44322264865482 22000:97.35114592621713 24000:95.22318954008665

cwmok commented 1 year ago

Hi @wanghao-cv,

Here are the results I trained earlier today using the code in the repository: Validation TRE log for Brats_NCC_disp_fea6b5_AdaIn64_t1ce_fbcon_occ01_inv1_a0015_aug_mean_github: 0:7.8542726022293845 2000:6.73274522975314 4000:4.421295289877301 6000:3.397277698520257 8000:3.169211049384084 10000:2.7044105335642334 12000:2.7780652696356465 14000:2.4305414183263414 16000:2.25415834316635 18000:2.3330128356665516 20000:2.033382620780193 22000:1.9894852492095876 24000:2.1334460152709105

Which Pytorch version are you using? Could you try PyTorch 1.9.0+cu111 and Python 3.7?

haowang020110 commented 1 year ago

I run the code with Pytorch1.12.1+cuda11.3 and Python 3.7. umm,your trainning today used the BRATS_train_DIRAC.py? I found that in BRATS_train_DIRAC.py line 414 ,the code is : log.write("Validation Dice log for " + model_name[0:-1] + ":\n"),which is different from your results's title--Validation TRE log..

cwmok commented 1 year ago

Yes. That's a typo. I just fixed it. The other parts are the same as in the repository. I am running both BRATS_train_DIRAC.py and BRATS_train_DIRAC_D.py. I could send you an update once I finish the training.

I am running it using the command "CUDA_VISIBLE_DEVICES=0 python BRATS_train_DIRAC.py". Are you training in a multi-GPU system? Maybe you should try to do something like this?

I also tested it on a Windows 10 machine with Pytorch 1.10.0+cu113 (single-GPU system). It works well too.

Example losses:

haowang020110 commented 1 year ago

I run the BRATS_train_DIRAC.py on a single GPU with Linux. I change the version of Pytorch to 1.10.0+cu113 and restart the trainning now.

haowang020110 commented 1 year ago

Hello,the problem seemed to be solved since I have changed the PyTorch version from 1.12to 1.10. The difference between 1.12 and 1.10 is incredible. the 1.12results Validation Dice log for Brats_NCC_disp_fea6b5_AdaIn64_t1ce_fbcon_occ01_inv1_a0015_aug_mean_fffixed_github: 0:12.030334159966776 2000:115.91007410458137 4000:33.72951164249721 6000:25.214690238684472 8000:12.730539371843165 10000:22.849013556216683 and the 1.10results Validation Dice log for Brats_NCC_disp_fea6b5_AdaIn64_t1ce_fbcon_occ01_inv1_a0015_aug_mean_fffixed_github: 0:13.05530537424255 2000:8.235118291610078 4000:6.045682190762699 6000:5.131279478047594 8000:3.9203483575676943 10000:4.109792298823343 12000:3.7403639585530626 14000:3.5788628137712992 16000:3.283032934854161 18000:3.013545133070159 20000:3.2962511379413564 Thanks a lot for your promptly reply! And can I get your Wechat number if you have?My email address is u202013055@hust.edu.cn .

cwmok commented 1 year ago

Hi @wanghao-cv,

Glad to hear that you have solved your problem. Regarding the Wechat number, I have dropped you an email.