frankkramer-lab / MIScnn

A framework for Medical Image Segmentation with Convolutional Neural Networks and Deep Learning
GNU General Public License v3.0
406 stars 115 forks source link

2D slicer interface datasplit #34

Open MLRadfys opened 4 years ago

MLRadfys commented 4 years ago

Hi Dominik,

hope you're well!

Just one thing I've noticed with the 2D slicer interface.

When you take the sample list and feed it to split_validation or cross_validation, the 2D slices are randomly put into training and validation sets. As a result, slices from the same patient will probably be put into both training and validation set/ fold k.

I think (which might be wrong though) this leads to information leakage during training and one suggestion would be to split the subjects into training and validation as it is done in the 3D case. In that case it is assured that all 2D slices of a specific patient volume are put in the training set or the validation set only.

Cheers,

Michael

muellerdo commented 4 years ago

Hey @MichaelLempart,

very interesting point! Thanks for this feedback.

You are right. It will introduce a bias when using the 2D slices from the same patient for training as well as validation.

Currently: Usage of the 2D slicer interface results into that each slice is a single sample, as expected when reading e.g. 2D PNG images. The internal workflow would be: 1) Extracting all slices from the 3D images and store them separately as e.g. 2D images in a folder 2) Load the 2D images 3) Get a sample for each 2D image

This results into the problem: All evaluation functions (for example a percentage split validation) randomly picks from the sample list -> from our complete 2D slices list in which the information to which 3D volume they belong are lost.

The easiest way would be to provide special validation functions for the slicer. The sample names are something like: 3DsampleName + _slicenumber Therefore, a function could watch out on this information when randomly sampling the training & validation sets.

A way more difficult approach would be to somehow save multiple 2D slices in a single MIScnn sample without declaring it as a 3D image.

I will think about this problem and how to enhance the slicer usage without much pipeline modifications.

Cheers, Dominik