Milkomeda98 / CKD-TransBTS

22 stars 5 forks source link

Batch size and other training settings #7

Open ZhuXiyue opened 1 year ago

ZhuXiyue commented 1 year ago

Hi, thank you for publishing your code.

I am having trouble reproducing your results reported in the paper by training my own model using 5-fold cross-validation. I am using batch size 1 and the validation loss converges at a relatively early epoch (around 100 epoch rather than 500), and the dice score is ~10% worse than that reported in the paper.

I wonder what is the batch size that is actually used? Are there any other tricks I need to apply to reproduce the training myself?

ZhuXiyue commented 1 year ago

A follow-up: batch size 4 works significantly better than batch size 1. At least for the first 400 epochs..... You should probably consider using gradient accumulation if you have to use a small batch size due to Cuda mem issues.

shanwq commented 8 months ago

A follow-up: batch size 4 works significantly better than batch size 1. At least for the first 400 epochs..... You should probably consider using gradient accumulation if you have to use a small batch size due to Cuda mem issues.

Hi, I am also having trouble reproducing the results(dice ~10+ worse). So may I ask, after you have used the gradient accumulation, have you got results close to the results in paper? By the way, may I ask what do you mean by gradient accumulation? Thanks a lot!

faizan1234567 commented 3 months ago

@ZhuXiyue I have the same issue. I want to reproduce the results in the paper on BraTS23 dataset but I have worse results. I am using batch size of 1, learning rate 1e-4, and NVIDIA RTX 4070 with 12GB of memory. I can't increase the batch size since it will cause cuda out of memory.

I don't know if my dataset split is okay or not. I am using 834 examples for training, 208 for validation, and 209 for the test as I splitted the original dataset randomly into 3 sets mentioned above. Finally, what do you mean by gradient accumulation?