#187 Fix Segmentation + #151 Training Data Shape

WGierke commented 7 years ago

The problem was that we haven't had a standardized way to handle training data. Now, the scans and binary nodule images have the shape 512x512x1024 so all DICOM images that have been rescaled to voxels fit into that. However, since this is a giant input matrix which can easily blow up the memory when training a neural network I added a SegmentationModel wrapper that takes this shape as an input and also predicts this shape, but the models that implement this interface can also rescale / crop / ... the input data as long as the predicted output is again of size 512x512x1024. For example consider the Simple3DModel: it scales the input by 1/4 for each axis, learns a model based on that and after predicting it scales the predicted binary nodule mask again by factor 4 such that the output shape equals the given training data shape again. Note that I mainly moved the code from models.py to the classes in the models/ directory to accomplish this.

Reference to official issue

This references #187 and #151 since I wasn't able to cleanly separate them.

How Has This Been Tested?

I added the piece of code that threw an error given in #187 as a test. All tests pass.

CLA

[X] I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

lamby commented 7 years ago

can easily blow up the memory when training a neural network

Oh neat. How much was it using OOI?

WGierke commented 7 years ago

It would take 1 GB to connect all the input neurons to just 1 output neuron. Using max pooling and convolutional layers would of course crunch the number of trained weights. However, even the 2nd place of the NDSB 2017 only used an input shape of 64x64x64 mm³ at max (which needed a Tesla K80 with 12 GB to train; they didn't rescale the whole scan to 64³ mm³ but rather slided this cube over the scan).

reubano commented 7 years ago

Nice work! Looking over it now.

reubano commented 7 years ago

Good to go! Mind squashing now?

drivendataorg / concept-to-clinic

#187 Fix Segmentation + #151 Training Data Shape #237

Reference to official issue

How Has This Been Tested?

CLA