Cross validation/Generating Data

Hello dear Sir,

i have 2 questions. One about the cross validation? You didnt give too much informations about it. When you do it, lets say we have 10 data, which are sorted like this d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 and you do a cross validation(5 fold) like this d1 d2 | d3 d4 | d5 d6 | d7 d8 | d9 d10

You have the model and calculate the average error over the data.

You can also have another sort like d1 d9 | d2 d4 | d7 d10 | d8 d5 | d6 d3

You make the same thing, meaning, you calculate the average error over the data And so on. At the end you will have many permutations, leading too average errors (models).

Is it ok, to only do it, like in the 1st case, that i presented, and why? (You can shortly explain it and maybe give us some references..) OR, do it with permutations (2nd case)? Does it make sense ? Why? How do the cross validation work exactly in your code? The answer will help us.

The second question. Lets say we dont have enough data to test it. Does it make sense to generate our own brains (raw and labeled data) for testing? Have you have already tried it, or can you give us some hints about it ?

Thanks my sir

Hi, please keep your questions to things regarding the code. This is off topic.

I do the cross-validation only once with a random split of the data (you can look it up in the code). I don't do any more perturbations. The cross-validation is intended to give a feel for the generalizability of the model. Given unlimited GPU resources you could do more permutations (or simply increase the number of folds all the way up to a leave one out cross-validation). I don't think this is necessary though.

If you create your own test set you may run into problems if it is not segmented by the same conventions as the training data. That would be a shift in data distributions and your model performance will drop.

Best, Fabian

MIC-DKFZ / BraTS2017

Cross validation/Generating Data #16