ellisdg / 3DUnetCNN

Pytorch 3D U-Net Convolution Neural Network (CNN) designed for medical image segmentation
MIT License
1.9k stars 653 forks source link

Expected size of training data set and metrics for accuracy #6

Closed build2create closed 7 years ago

build2create commented 7 years ago

According to this paper, they used IoU (paper says and I quote : TheIoU is defined as true positives/(true positives + false negatives + false positives)) as a metric for accuracy. I guess here dice_coeff is used but the score is coming quite low for epoch around 0.09(loss is also showing same value but negated). I kept number of epochs as 2. Score is however increasing as increase number of epochs. Should I ideally keep it to 50 as you used in config.py or 70000 as paper says.(We ran 70000 training iterations on an NVIDIA TitanX GPU, which took approximately 3 days.) Should I increase initial learning rate then i.e config["initial_learning_rate"]? I am using 20 patients( each with 4 modalities and ground truth) for training. Is that too much or too less? I am asking this because each epoch is taking quite long to execute. I am using a i7 processor without Nvidia GPU.Paper says:In many biomedical applications, only very few images are required to train a network that generalizes reasonably well.But then doubt still remains what is training data size?

build2create commented 7 years ago

Also can you elaborate on these ids generated in testing_ids.pkl and training _ids.pkl. I am guessing they are split on the basis of config["validation_split"] = 0.8 and that is random. But then what is this first and last line like (lp0 and a.. And the prefixes that come with every id like I or aI.

ellisdg commented 7 years ago

I'm still trying to figure out the ideal number of epochs and learning rate. I'm currently using all 274 sets of scans from the BRATS 2015 data. My training set size is 219. I am on epoch 14/50 and getting a dice coefficient for the validation set of 0.66. I've been running it for four days, but I am also using cpu, so it is quite slow.

Usually, the more training data you use, the more generalizable your resulting classifier will be. Using more training data reduces the risk of overfitting the model to the training data.

If you run the training once and then want to run more training iterations, the code should read in the weights from the model previously trained and you wouldn't be starting from scratch.

Correct, the training_ids.pkl and testing_ids.pkl are from the random validation split. I save them so that when the model is reloaded for more training, the training and validation data are consistent between training runs. They are saved as pickle files. You can load and save these files using these functions.

A far as metrics goes, Intersection over Union (IoU) is also known as the Jaccard Coefficient. Page 2 of this paper has a table that shows the difference between the Jaccard and Dice coefficients.

Based on the results from previous BRATS participants, it looks like the top methods have an average cross-validated dice coefficient around 0.85 to 0.90 for whole tumor segmentation.

build2create commented 7 years ago

Yes that's right and I hope by the time you reach 50th epoch you get dice score 0.80+ .Meanwhile I will try here. Thanks for the help.

build2create commented 7 years ago

Are we also doing data augmentation in BRATS training data? Paper says: Data augmentation is done on-the-fly, which results in as many different images as training iterations.

ellisdg commented 7 years ago

@build2create Nope, I don't have data augmentation setup. I think it would be interesting to see if augmentation would result in better validation scores. Though, I'm wondering if the BRATS data set is big enough that augmentation wouldn't help as much. My understanding is that augmentation helps more on smaller data sets, like the data set of 3 volumes that the authors of the paper were using.

build2create commented 7 years ago

What dice_coeff did you get after 50 epochs?

ellisdg commented 7 years ago

I'm on epoch 32 and my validation score is hovering around 0.71. This is with 3 modalities (T1, T1c, and FLAIR). I assume that adding T2, which is significantly different from the other 3 modalities, would give better results, but I haven't tested that out yet.

I also started training the network on an HPC system with massive amounts memory. This allowed me to use bigger batch sizes (20 per batch). On this system I'm getting validation scores above 0.74.

build2create commented 7 years ago

I'm using the same code. I am on epoch 29. I am getting dice_coeff around 0.37 . I am not using HPC but a core i7 system with limited memory. Also I am using data of just 20 patients. Is this why I am getting the low score?(Doesn't seem so from the formula of dice_coef) I know as you rightly pointed out in the comments above that small training set will lead to over-fitting, but idea is I am trying to augment the training data with some features on the fly(currently writing the code for the same). So I am trying with just 20 patients. But I guess that should not affect the dice_score , right?

ellisdg commented 7 years ago

As I understand it, the point of data augmentation is to make the data set bigger/more diverse so as to avoid over fitting. So yes, the data augmentation has the potential to give a better dice score, but I don't see the point since you are only using a fraction of the total dataset. I suspect that if you used the whole dataset you would get much better results than you would with just data augmentation on 20 patients. If you used data augmentation on top of the whole dataset, you might get even better results, but then training the classifier would take even longer. And yes, the number of patients you are using for training absolutely affects the dice_score that is used for classifier validation. This is because the classifier cannot accurately segment a tumor that is substantially different than the tumors in the training data set.

ellisdg commented 7 years ago

@build2create is the 0.37 the training dice_coef or the validation val_dice_coef, or both?

build2create commented 7 years ago

This is the status in training.log


0,0.072843759000534192,-0.072843759000534192,0.068401305703446269,-0.068401305703446269
1,0.08896051999181509,-0.08896051999181509,0.084824420977383852,-0.084824420977383852
2,0.13554965611547232,-0.13554965611547232,0.13141066767275333,-0.13141066767275333
3,0.1912183741806075,-0.1912183741806075,0.15339560247957706,-0.15339560247957706
4,0.22206817503320053,-0.22206817503320053,0.1854435782879591,-0.1854435782879591
5,0.24033747357316315,-0.24033747357316315,0.19235427211970091,-0.19235427211970091
6,0.26939574000425637,-0.26939574000425637,0.19200655445456505,-0.19200655445456505
7,0.26723482995294034,-0.26723482995294034,0.17815075721591711,-0.17815075721591711
8,0.25726900203153491,-0.25726900203153491,0.19581024069339037,-0.19581024069339037
9,0.28743100876454264,-0.28743100876454264,0.20547199435532093,-0.20547199435532093
10,0.28345018276013434,-0.28345018276013434,0.20867956522852182,-0.20867956522852182
11,0.30249188677407801,-0.30249188677407801,0.19985756371170282,-0.19985756371170282
12,0.27847031434066594,-0.27847031434066594,0.212277976796031,-0.212277976796031
13,0.30272544175386429,-0.30272544175386429,0.19877163041383028,-0.19877163041383028
14,0.30582901276648045,-0.30582901276648045,0.20303131826221943,-0.20303131826221943
15,0.31700906250625849,-0.31700906250625849,0.20621623937040567,-0.20621623937040567
16,0.30473658861592412,-0.30473658861592412,0.21544630825519562,-0.21544630825519562
17,0.32712066732347012,-0.32712066732347012,0.21503214165568352,-0.21503214165568352
18,0.33279128838330507,-0.33279128838330507,0.21530395187437534,-0.21530395187437534
19,0.33517137146554887,-0.33517137146554887,0.21856578812003136,-0.21856578812003136
20,0.33235382533166558,-0.33235382533166558,0.21894586645066738,-0.21894586645066738
21,0.33862463827244937,-0.33862463827244937,0.22050339821726084,-0.22050339821726084
22,0.34313072427175939,-0.34313072427175939,0.22117975819855928,-0.22117975819855928
23,0.34421288408339024,-0.34421288408339024,0.22354894783347845,-0.22354894783347845
24,0.34633920760825276,-0.34633920760825276,0.22364498116075993,-0.22364498116075993
25,0.34828811348415911,-0.34828811348415911,0.22447984479367733,-0.22447984479367733
26,0.34690546896308661,-0.34690546896308661,0.22429444827139378,-0.22429444827139378
27,0.34773528017103672,-0.34773528017103672,0.22610866371542215,-0.22610866371542215
28,0.35065294709056616,-0.35065294709056616,0.22593812551349401,-0.22593812551349401
29,0.35192273929715157,-0.35192273929715157,0.22642500698566437,-0.22642500698566437
30,0.3525962233543396,-0.3525962233543396,0.22642711084336042,-0.22642711084336042
31,0.35317865409888327,-0.35317865409888327,0.22680095676332712,-0.22680095676332712
32,0.35318650747649372,-0.35318650747649372,0.22678552474826574,-0.22678552474826574
build2create commented 7 years ago

Hello @ellisdg I am restarting again, this time using larger dataset of size 200 and GPU enabled system. I have configured CUDA and CUDNN for tensorflow. I am using old keras version (keras 1.2.2) . I tried new version as well but some of functions were throwing warning messages. Anyways,that's not the issue I need some clarifications before I begin, a)By default keras is using tensorflow backend are you also sticking to same or you are using theano backend? b)Given that I would be training on GPU now will the memory constraint still hold? Should I reduce size of image less than 144 X 144 X 144?Or may be we can just experiment with this size once?

I have not yet started, so I can also add T2 modality as well. One final thing can I apply ImageDataGenerator for augmenting the data?

ellisdg commented 7 years ago

a) I am using Theano, so I don't know if Tensorflow works b) I don't know what size image the GPU will be able to handle. You'll just have to test it out.

I have been curious about whether or not ImageDataGenerator would work for 3D. I don't know if it will, but you could try it out.

It might also help to change:

config["decay_learning_rate_every_x_epochs"] = 5

to something higher than 5. (maybe 10). I'm wondering if that is what went wrong with your 20 subject training. Training 20 subjects for 10 epochs is roughly equivalent to training 200 subjects for 1 epoch. So after 5 epochs with 20 subjects the learning rate would decay after seeing 100 iterations.This is probably much earlier than you would want it to decay. For comparison after 5 epochs with 200 subjects the training would have iterated 1000 times. So for 20 subjects, the number of epochs should be greater and the decay schedule should be somewhere around every 50 epochs in order to be equivalent to training with a larger dataset.

Here is a good tutorial on learning rate decay.

build2create commented 7 years ago

Okay, I'll configure CUDA and CUDNN for theano then and keep the config["decay_learning_rate_every_x_epochs"] = 10 thanks for the tutorial link. I'll let you know if ImageDataGenerator or any other related solution works.