frankkramer-lab / MIScnn

A framework for Medical Image Segmentation with Convolutional Neural Networks and Deep Learning
GNU General Public License v3.0
398 stars 116 forks source link

Model.Train and Cross Validation #147

Open dannyhow12 opened 2 years ago

dannyhow12 commented 2 years ago

Hi and good day,

Thank you for the wonderful repo which has been super user friendly.

I would like to extend a question on Model.Train vs Cross Validation, where the cross validation was used in the KiTS19.ipynb example. However, due to the usage limit in Google Colab free version, it could not be completed. Thus, I am attempting to train an alternative method which is model.train, where I believe that it is shown in the BRATS2020.ipynb example, as well as referencing it from model.py

However, despite running the code on Google Colab, the training does not seem to start at all as it just seems to be loading forever. Could you please point out on whether my method of calling model.train in this source code is correct? Many thanks.

` import tensorflow as tf import os from tensorflow.python.keras.saving.saving_utils import model_metadata from miscnn.data_loading.interfaces.nifti_io import NIFTI_interface from miscnn.data_loading.data_io import Data_IO from miscnn.processing.data_augmentation import Data_Augmentation from miscnn.processing.subfunctions.normalization import Normalization from miscnn.processing.subfunctions.clipping import Clipping from miscnn.processing.subfunctions.resampling import Resampling from miscnn.processing.preprocessor import Preprocessor from miscnn.neural_network.model import Neural_Network from miscnn.neural_network.architecture.unet.standard import Architecture from miscnn.neural_network.metrics import dice_soft, dice_crossentropy, tversky_loss from tensorflow.keras.callbacks import ReduceLROnPlateau from tensorflow.keras.callbacks import EarlyStopping from tensorflow.keras.callbacks import ModelCheckpoint

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Initialize the NIfTI I/O interface and configure the images as one channel (grayscale) and three segmentation classes (background, kidney, tumor)

interface = NIFTI_interface(pattern="case_00[0-9]*", channels=1, classes=3)

Specify the kits19 data directory

data_path = "/content/drive/MyDrive/data1/"

Create the Data I/O object

data_io = Data_IO(interface, data_path)

sample_list = data_io.get_indiceslist() sample_list.sort()

Create and configure the Data Augmentation class

data_aug = Data_Augmentation(cycles=2, scaling=True, rotations=True, elastic_deform=True, mirror=True, brightness=True, contrast=True, gamma=True, gaussian_noise=True)

Create a pixel value normalization Subfunction through Z-Score

sf_normalize = Normalization(mode='z-score')

Create a clipping Subfunction between -79 and 304

sf_clipping = Clipping(min=-79, max=304)

Create a resampling Subfunction to voxel spacing 3.22 x 1.62 x 1.62

sf_resample = Resampling((3.22, 1.62, 1.62))

Assemble Subfunction classes into a list

Be aware that the Subfunctions will be exectued according to the list order!

29042022 version: removed sf_clipping

subfunctions = [sf_resample, sf_normalize]

data_aug=data_aug Add inside Preprocessor 29042022 11.44pm removed

Create and configure the Preprocessor class

pp = Preprocessor(data_io, batch_size=4, subfunctions=subfunctions, prepare_subfunctions=True, prepare_batches=False, analysis="patchwise-crop", patch_shape=(80, 160, 160), use_multiprocessing=True)

Adjust the patch overlap for predictions

pp.patchwise_overlap = (40, 80, 80)

Create the Neural Network model

unet_standard = Architecture(depth=4, activation="softmax", batch_normalization=True) model = Neural_Network(preprocessor=pp, architecture=unet_standard, loss=tversky_loss, metrics=[dice_soft, dice_crossentropy], learning_rate=0.0001)

Define Callbacks

cb_lr = ReduceLROnPlateau(monitor='loss', factor=0.1, patience=20, verbose=1, mode='min', min_delta=0.0001, cooldown=1, min_lr=0.00001) cb_es = EarlyStopping(monitor='loss', min_delta=0, patience=150, verbose=1, mode='min')

cb_cp = ModelCheckpoint("models/kits_unet.{epoch:02d}.hdf5", monitor='val_loss', verbose=1, save_freq=90*20)

model.train(sample_list, epochs=10, iterations=5, callbacks=[cb_lr, cb_es]) `

At the terminal, it just shows

/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/adam.py:105: UserWarning: The lr argument is deprecated, use learning_rate instead. super(Adam, self).init(name, **kwargs)

and nothing more.

Regards, Danny

**Quite interestingly, the terminal was able to show something and its epoch, after the use_multiprocessing was set to False.

muellerdo commented 2 years ago

Hello @dannyhow12,

thank you for your kind words!

Sorry for the late reply, did you already find a solution for this issue?

Your code looks fine and shouldn't be the problem.

**Quite interestingly, the terminal was able to show something and its epoch, after the use_multiprocessing was set to False.

That would be also one of my first recommendations to turn of multiprocessing (use_multiprocessing). Tensorflow is by default extremely cpu hungry, but I do not have much experience on multiprocessing in the Google Colab environment.

Be aware: If you have prepare_subfunctions=True, the training will start after preprocessing the complete dataset which will take a while for a larger 3D dataset like kits19 (but, I guess, it should not take longer than 30mins. On our workstation it was about 10mins).

Also be aware that MIScnn is currently only working with the dev-branch on Google Colab due to the requirement of Python 3.8: Check out this issue https://github.com/frankkramer-lab/MIScnn/issues/146

Cheers, Dominik

dannyhow12 commented 2 years ago

Hello @muellerdo,

Yep, I found the solution after setting use_multiprocessing to false, and it was able to train on Google Colab.

Interesting, however on Google Colab it took around 1h 30 minutes to process all 300 images of the KiTS21 dataset. Also, if I were to reuse the same batches for subsequent training, is there anyway that I can perform such an action based on the MIScnn framework? Because there are usage limits for Google Colab, thus the current implementation that I am working on is by setting ModelCheckpoint as callbacks.

Despite saying so, there is a drawback to this, as during each initiation of the training, the datasets need to be preprocessed again and this takes quite a long time. Is there a way where I could load the pickle file generated directly and to be implemented in the tranining? Upon reviewing the data_io.py, the delete_batchDir denotes that if True = delete temporary batches directory False = delete only the batch data for the current seed I am hoping for more clarification regarding this. Thank you

Yep, I am aware of the issue raised in #146 . Thank you for reminding.

Thank you for the response!

Best regards, Danny

riki-igarashi commented 1 year ago

Thank you for providing us with a great library!

I think this problem is caused by using multiprocessing on Windows. On Windows, we need to explicitly pass a global variable when creating a child process in multiprocessing. Therefore, in the current code, the seed value of Data_IO changes each time a child process is created.

ref) Python Multiprocess diff between Windows and Linux

If only Google Colab is supported, my code may be meaningless, but here is a fix that will work on Windows with use_multiprocessing=True.

miscnn/processing/preprocessor.py

    def run_subfunctions(self, indices_list, training=True):
        # Prepare subfunctions using single threading
        if not self.use_multiprocessing or not training:
            for index in indices_list:
                self.prepare_sample_subfunctions(index, training)
        # Prepare subfunctions using multiprocessing
        else:
            pool = mp.Pool(int(self.mp_threads))
            pool.map(partial(self.prepare_sample_subfunctions,
                             training=training, seed=self.data_io.seed),  # change here !
                     indices_list)
            pool.close()
            pool.join()

    # Wrapper function to process subfunctions for a single sample
    def prepare_sample_subfunctions(self, index, training, seed=None):    # change here !
        # Load sample
        if seed is not None:                                              # change here !
            self.data_io.seed = seed
        sample = self.data_io.sample_loader(index, load_seg=training)
        # Run provided subfunctions on imaging data
        for sf in self.subfunctions:
            sf.preprocessing(sample, training=training)
        # Transform array data types in order to save disk space
        sample.img_data = np.array(sample.img_data, dtype=np.float32)
        if training:
            sample.seg_data = np.array(sample.seg_data, dtype=np.uint8)
        # Backup sample as pickle to disk
        self.data_io.backup_sample(sample)

It is a bit oddly written, but it works in my environment.

Hope this helps someone out there!

ps) I think this issue (#77) is caused by the same problem

muellerdo commented 1 year ago

Hey @riki-igarashi,

thank you for this contribution! Definitely helpful for Windows users :)

Best Regards, Dominik