frankkramer-lab / MIScnn

A framework for Medical Image Segmentation with Convolutional Neural Networks and Deep Learning
GNU General Public License v3.0
402 stars 116 forks source link

zlib.error: Error -3 while decompressing data: invalid distance code #21

Closed luhc228 closed 4 years ago

luhc228 commented 4 years ago

When I run the above code which I copy the kits19 example in this repo, it will throw the error.

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.compat.v1.Session(config=config)

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from miscnn.data_loading.interfaces.nifti_io \
     import NIFTI_interface
from miscnn.data_loading.data_io import Data_IO

# Initialize the NIfTI I/O interface and configure the images as one channel (grayscale) and three segmentation classes (background, kidney, tumor)
interface = NIFTI_interface(pattern="case_00[0-9]*",
                            channels=1, classes=3)

# Specify the kits19 data directory ('kits19/data' was renamed to 'kits19/data.original')
data_path = "../kits19/data/"
# Create the Data I/O object
data_io = Data_IO(interface, data_path)

sample_list = data_io.get_indiceslist()
sample_list.sort()
print("All samples: " + str(sample_list))

# Library import
from miscnn.processing.data_augmentation import Data_Augmentation

# Create and configure the Data Augmentation class
data_aug = Data_Augmentation(cycles=2, scaling=True, rotations=True, elastic_deform=True, mirror=True,
                             brightness=True, contrast=True, gamma=True, gaussian_noise=True)

# Library imports
from miscnn.processing.subfunctions.normalization import Normalization
from miscnn.processing.subfunctions.clipping import Clipping
from miscnn.processing.subfunctions.resampling import Resampling

# Create a pixel value normalization Subfunction through Z-Score
sf_normalize = Normalization()
# Create a clipping Subfunction between -79 and 304
sf_clipping = Clipping(min=-79, max=304)
# Create a resampling Subfunction to voxel spacing 3.22 x 1.62 x 1.62
sf_resample = Resampling((3.22, 1.62, 1.62))

# Assemble Subfunction classes into a list
# Be aware that the Subfunctions will be exectued according to the list order!
subfunctions = [sf_resample, sf_clipping, sf_normalize]

# Library import
from miscnn.processing.preprocessor import Preprocessor

# Create and configure the Preprocessor class
pp = Preprocessor(data_io, data_aug=data_aug, batch_size=2, subfunctions=subfunctions, prepare_subfunctions=True,
                  prepare_batches=False, analysis="patchwise-crop", patch_shape=(80, 160, 160))

# Adjust the patch overlap for predictions
pp.patchwise_overlap = (40, 80, 80)

# Library import
from miscnn.neural_network.model import Neural_Network
from miscnn.neural_network.metrics import dice_soft, dice_crossentropy, tversky_loss

# Create the Neural Network model
model = Neural_Network(preprocessor=pp, loss=tversky_loss, metrics=[dice_soft, dice_crossentropy],
                       batch_queue_size=3, workers=3, learninig_rate=0.0001)

# Define Callbacks
from keras.callbacks import ReduceLROnPlateau
cb_lr = ReduceLROnPlateau(monitor='loss', factor=0.1, patience=20, verbose=1, mode='min', min_delta=0.0001, cooldown=1,
                          min_lr=0.00001)

# Exclude suspious samples from data set
del sample_list[133]
del sample_list[125]
del sample_list[68]
del sample_list[37]
del sample_list[23]
del sample_list[15]

# Create the validation sample ID list
validation_samples = sample_list[0:120]
# Output validation samples
print("Validation samples: " + str(validation_samples))

# Library import
from miscnn.evaluation.cross_validation import cross_validation
# Run cross-validation function
cross_validation(validation_samples, model, k_fold=3, epochs=350, iterations=150,
                 evaluation_path="evaluation", draw_figures=True, callbacks=[cb_lr])

from IPython.display import Image
Image(filename = "evaluation/fold_0/validation.dice_soft.png")
2020-06-09 22:29:18.519369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.721GHz coreCount: 28 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 451.17GiB/s
2020-06-09 22:29:18.520659: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-06-09 22:29:18.521604: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
2020-06-09 22:29:18.522518: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2020-06-09 22:29:18.523396: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2020-06-09 22:29:18.524367: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2020-06-09 22:29:18.525243: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
2020-06-09 22:29:18.526137: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2020-06-09 22:29:18.526283: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-06-09 22:29:18.526857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-09 22:29:18.527027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      
Using TensorFlow backend.
Validation samples: ['case_00000', 'case_00001', 'case_00002', 'case_00003', 'case_00004', 'case_00005', 'case_00006', 'case_00007', 'case_00008', 'case_00009', 'case_00010', 'case_00011', 'case_00012', 'case_00013', 'case_00014', 'case_00016', 'case_00017', 'case_00018', 'case_00019', 'case_00020', 'case_00021', 'case_00022', 'case_00024', 'case_00025', 'case_00026', 'case_00027', 'case_00028', 'case_00029', 'case_00030', 'case_00031', 'case_00032', 'case_00033', 'case_00034', 'case_00035', 'case_00036', 'case_00038', 'case_00039', 'case_00040', 'case_00041', 'case_00042', 'case_00043', 'case_00044', 'case_00045', 'case_00046', 'case_00047', 'case_00048', 'case_00049', 'case_00050', 'case_00051', 'case_00052', 'case_00053', 'case_00054', 'case_00055', 'case_00056', 'case_00057', 'case_00058', 'case_00059', 'case_00060', 'case_00061', 'case_00062', 'case_00063', 'case_00064', 'case_00065', 'case_00066', 'case_00067', 'case_00069', 'case_00070', 'case_00071', 'case_00072', 'case_00073', 'case_00074', 'case_00075', 'case_00076', 'case_00077', 'case_00078', 'case_00079', 'case_00080', 'case_00081', 'case_00082', 'case_00083', 'case_00084', 'case_00085', 'case_00086', 'case_00087', 'case_00088', 'case_00089', 'case_00090', 'case_00091', 'case_00092', 'case_00093', 'case_00094', 'case_00095', 'case_00096', 'case_00097', 'case_00098', 'case_00099', 'case_00100', 'case_00101', 'case_00102', 'case_00103', 'case_00104', 'case_00105', 'case_00106', 'case_00107', 'case_00108', 'case_00109', 'case_00110', 'case_00111', 'case_00112', 'case_00113', 'case_00114', 'case_00115', 'case_00116', 'case_00117', 'case_00118', 'case_00119', 'case_00120', 'case_00121', 'case_00122', 'case_00123']

Traceback (most recent call last):
  File "D:/luhengchang/kits19.MIScnn/train.py", line 186, in <module>
    evaluation_path="evaluation", draw_figures=True, callbacks=[cb_lr])
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\evaluation\cross_validation.py", line 84, in cross_validation
    iterations=iterations, callbacks=cb_list)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\neural_network\model.py", line 189, in evaluate
    iterations=iterations)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\neural_network\data_generator.py", line 51, in __init__
    preprocessor.run_subfunctions(sample_list, training)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\processing\preprocessor.py", line 216, in run_subfunctions
    sample = self.data_io.sample_loader(index, load_seg=training)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\data_loading\data_io.py", line 88, in sample_loader
    image = self.interface.load_image(index)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\data_loading\interfaces\nifti_io.py", line 87, in load_image
    vol_data = vol.get_data()
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\nibabel\dataobj_images.py", line 202, in get_data
    data = np.asanyarray(self._dataobj)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\numpy\core\_asarray.py", line 138, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\nibabel\arrayproxy.py", line 356, in __array__
    raw_data = self.get_unscaled()
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\nibabel\arrayproxy.py", line 351, in get_unscaled
    mmap=self._mmap)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\nibabel\volumeutils.py", line 524, in array_from_file
    n_read = infile.readinto(data_bytes)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\gzip.py", line 276, in read
    return self._buffer.read(size)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\gzip.py", line 471, in read
    uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid distance code

Process finished with exit code 1

Could you give me some suggestion how to solve it?

muellerdo commented 4 years ago

Hey luhc228,

looks like the nibabel package having problems loading the NIfTI file.

Can you verify that you correctly downloaded the kits19 data set?

E.g. by opening an imaging.nii.gz with a NIfTI viewer of your choice

or via Python:

import nibabel as nib
vol = nib.load("D:/luhengchang/kits19/data/case_00000/imaging.nii.gz")
vol_data = vol.get_data()

Cheers, Dominik

luhc228 commented 4 years ago

I download the kits19 data set from it. https://github.com/neheller/kits19/tree/interpolated

I run the py code, the result is image

muellerdo commented 4 years ago

Mhm.

Try out touching every volume with something like that.

import nibabel as nib
import os

ds = "D:/luhengchang/kits19/data"
sample_list = os.listdir(ds)

for sample in sample_list:
   if sample in ["LICENSE", "kits.json"] : continue

   print("Checking sample:", sample)
   path_vol = os.path.join(ds, sample, "imaging.nii.gz")
   vol = nib.load(path_vol)
   vol_data = vol.get_data()

And could you please post the full console log of the MIScnn run?

luhc228 commented 4 years ago

Hello, muellerdo. Thank you for your patience. I found that there is a problem with the case_00002 dataset. After I downloaded it again, everything is fine.

But now I get another error,

Traceback (most recent call last):
  File "D:/luhengchang/kits19.MIScnn/train.py", line 203, in <module>
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\evaluation\cross_validation.py", line 84, in cross_validation
    iterations=iterations, callbacks=cb_list)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\neural_network\model.py", line 201, in evaluate
    max_queue_size=self.batch_queue_size)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\keras\engine\training.py", line 826, in fit
    steps=data_handler.inferred_steps)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\keras\callbacks.py", line 231, in __init__
    cb._implements_train_batch_hooks() for cb in self.callbacks)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\keras\callbacks.py", line 231, in <genexpr>
    cb._implements_train_batch_hooks() for cb in self.callbacks)
AttributeError: 'ReduceLROnPlateau' object has no attribute '_implements_train_batch_hooks'

I wish you could give me some more help.

muellerdo commented 4 years ago

No problem.

Oh no :x The error is in the KITS19 Jupyternotebook example.

You have to change the ReduceLROnPlateau Callback from the original Keras library to the Tensorflow.Keras library:

Original line:

from keras.callbacks import ReduceLROnPlateau

Replace with with this:

from tensorflow.keras.callbacks import ReduceLROnPlateau

The problem here is that MIScnn updated to TensorFlow 2.X about 2 month ago. In TensorFlow 2.X, they integrated Keras as High-level API. Therefore, instead of using the original Keras library, I have switched MIScnn to the Keras integrated in TensorFlow for more compact and robust dependencies.

Sadly the Keras and TF Keras libraries are not that compatible. It is not possible to add a Callback from the original Keras library to a TF Keras model.

Long story short: It should work if you update the import line.

I will add a to-do on my agenda to update the KiTS19 example. Thanks for pointing this out! :)

Cheers, Dominik

ZeeshanAbbas92 commented 4 years ago

Dear Muellerdo,

After following the above changes I am now getting this error: EOFError: Compressed file ended before the end-of-stream marker was reached

and when I run the above mentioned nibabel package code what I get is shown in the screenshot attached. aaa

muellerdo commented 4 years ago

Hey ZeeshanAbbas92,

@ZeeshanAbbas92 said: After following the above changes I am now getting this error: EOFError: Compressed file ended before the end-of-stream marker was reached

By loading which file? The problem here is that a volume of your data set is probably corrupted.

Could you try downloading the whole data set, again, and ensure that you have a stable internet connection during the download?

The other option would be to identify which volume is corrupted by running a for loop over all samples.

For the DeprecationWarning in your screenshot: You can ignore it, but thanks for showing me this. The nibabel package declared the get_data() as deprecated in favor of get_fdata(). You could still use the get_data() function as normal until November 2021. Or we use get_fdata() instead of get_data() ;)

import nibabel as nib
import os

ds = "D:/luhengchang/kits19/data"
sample_list = os.listdir(ds)

for sample in sample_list:
   if sample in ["LICENSE", "kits.json"] : continue

   print("Checking sample:", sample)
   path_vol = os.path.join(ds, sample, "imaging.nii.gz")
   vol = nib.load(path_vol)
   vol_data = vol.get_fdata()

But thanks for the notification, I will update the NIfTI IO Interface.

ZeeshanAbbas92 commented 4 years ago

Dear,

There were some folders which didn't have this "segmentation.nii.gz" file so I deleted them and the error ended but now the problem is when I run this it takes too much memory and it naked the C-drive full i.e. 100GB. What is the reason?

Second thing: "from IPython.display import Image Image(filename = "evaluation/fold_0/validation.dice_soft.png")"

I don't have any images in this folder fold_0. How can I resolve these issues?

Best Regards, Zeeshan Abbas

On Fri, 12 Jun 2020 at 18:19, Dominik Müller notifications@github.com wrote:

After following the above changes I am now getting this error: EOFError: Compressed file ended before the end-of-stream marker was reached

By loading which file? The problem here is that a volume of your data set is still corrupted.

Could you try downloading the whole data set, again, and ensure that you have a stable internet connection during the download?

The other option would be to identify which volume is corrupted by running a for loop over all samples.

For the DeprecationWarning in your screenshot: You can ignore it, but thanks for showing me this. The nibabel package declared the get_data() as deprecated in favor of get_fdata(). You could still use the get_data() function as normal until November 2021. Or we use get_fdata() instead of get_data() ;)

import nibabel as nibimport os ds = "D:/luhengchang/kits19/data"sample_list = os.listdir(ds) for sample in sample_list: if sample in ["LICENSE", "kits.json"] : continue

print("Checking sample:", sample) path_vol = os.path.join(ds, sample, "imaging.nii.gz") vol = nib.load(path_vol) vol_data = vol.get_fdata()

But thanks for the notification, I will update the NIfTI IO Interface.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/frankkramer-lab/MIScnn/issues/21#issuecomment-643170387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP5KZWWFC6YDQMRNYLLZUM3RWHXLBANCNFSM4NZPMM5Q .

luhc228 commented 4 years ago

It seems that my GPU memory is not enough. My GPU is a GeForce GTX 1080 Ti. Which code should I modify? I have no idea. Could you give me a hand? Thanks.

Epoch 1/200
2020-06-13 10:33:11.404683: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-13 10:33:12.441369: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-06-13 10:33:12.522261: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-13 10:33:24.796266: W tensorflow/core/common_runtime/bfc_allocator.cc:434] Allocator (GPU_0_bfc) ran out of memory trying to allocate 500.00MiB (rounded to 524288000)
Current allocation summary follows.
2020-06-13 10:33:24.797683: I tensorflow/core/common_runtime/bfc_allocator.cc:934] BFCAllocator dump for GPU_0_bfc

2020-06-13 10:33:24.893192: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 7171da000 of size 21127168 next 18446744073709551615
2020-06-13 10:33:24.893313: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Next region of size 268435456
2020-06-13 10:33:24.893407: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 718600000 of size 2048 next 275
2020-06-13 10:33:24.893509: I tensorflow/core/common_runtime/bfc_allocator.cc:990] InUse at 718600800 of size 2048 next 276
....
2020-06-13 10:33:24.938149: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 7 Chunks of size 131072000 totalling 875.00MiB
2020-06-13 10:33:24.938257: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 136347648 totalling 130.03MiB
2020-06-13 10:33:24.938365: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 262144000 totalling 250.00MiB
2020-06-13 10:33:24.938473: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 7 Chunks of size 524288000 totalling 3.42GiB
2020-06-13 10:33:24.938579: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 536870912 totalling 512.00MiB
2020-06-13 10:33:24.938688: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 549453824 totalling 524.00MiB
2020-06-13 10:33:24.938796: I tensorflow/core/common_runtime/bfc_allocator.cc:998] 1 Chunks of size 1048576000 totalling 1000.00MiB
2020-06-13 10:33:24.938906: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 7.45GiB
2020-06-13 10:33:24.939007: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 8588886016 memory_limit_: 9106107123 available bytes: 517221107 curr_region_allocation_bytes_: 8589934592
2020-06-13 10:33:24.939192: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats: 
Limit:                  9106107123
InUse:                  7994491136
MaxInUse:               8211816448
NumAllocs:                    1659
MaxAllocSize:           2470641664

2020-06-13 10:33:24.939429: W tensorflow/core/common_runtime/bfc_allocator.cc:439] ***********************************************************************************************_____
2020-06-13 10:33:24.939607: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at broadcast_to_op.cc:65 : Resource exhausted: OOM when allocating tensor with shape[2,80,160,160,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "D:/luhengchang/kits19.MIScnn/train.py", line 180, in <module>
    evaluation_path="evaluation1", draw_figures=True, callbacks=[cb_lr])
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\evaluation\cross_validation.py", line 84, in cross_validation
    iterations=iterations, callbacks=cb_list)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\neural_network\model.py", line 201, in evaluate
    max_queue_size=self.batch_queue_size)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\keras\engine\training.py", line 848, in fit
    tmp_logs = train_function(iterator)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\function.py", line 598, in call
    ctx=ctx)
  File "D:\Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor with shape[2,80,160,160,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node gradient_tape/model/batch_normalization_17/moments/BroadcastTo (defined at \Users\Administrator\anaconda3\envs\keras-unet\lib\site-packages\miscnn\neural_network\model.py:201) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_9574]

Function call stack:
train_function

2020-06-13 10:33:24.998239: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
     [[{{node PyFunc}}]]
muellerdo commented 4 years ago

@ZeeshanAbbas92,

could you open a separated issue, please? This would improve the overview and maybe can help others with the same problem, as well.

I'm happy to help you with your issue, then :) Thank you.


@luhc228,

the GeForce GTX 1080 Ti has 11 GB VRAM.

We have now several options for reducing the model complexity.

1. Reduce patch shape

The size of the patch is the main factor which influences the model complexity if we insist on using the 3D U-Net architecture. You can have a try a patch shape of (40x80x80) and see how well your model performs. Depending on your data set, you can additionally adjust the resampling in order to boost the performance for this patch shape, again. I would recommend to aim for a patch size of 1/8 of the median volume size after resampling. (e.g. for a volume size 512x512x512 a patch shape of 256x256x256)

pp = Preprocessor(data_io, data_aug=data_aug, batch_size=2, subfunctions=subfunctions,
                              prepare_subfunctions=True, prepare_batches=False, 
                              analysis="patchwise-crop", patch_shape=(40, 80, 80))

2. Turn off batch normalization

The normal 3D U-Net takes around 8GB VRAM, with batch normalization 16GB. Turning off batch normalization of the architecture, will reduce the required VRAM to half of it.

from miscnn.neural_network.architecture.unet.standard import Architecture
unet = Architecture(batch_normalization=False)

model = Neural_Network(preprocessor=pp, loss=tversky_loss, 
                       architecture=unet, metrics=[dice_soft, dice_crossentropy],
                       batch_queue_size=3, workers=3, learninig_rate=0.0001)

You can also try the plain U-Net which is a more simpler but equally powerful U-net standard variant.

from miscnn.neural_network.architecture.unet.plain import Architecture
unet = Architecture(batch_normalization=False)

model = Neural_Network(preprocessor=pp, loss=tversky_loss, 
                       architecture=unet, metrics=[dice_soft, dice_crossentropy],
                       batch_queue_size=3, workers=3, learninig_rate=0.0001)

3. Switch to 2D

Another option is to use the NIfTI_slicer IO interface and run a 2D analysis. With this approach, you can automatically split the 3D volumes into 2D slices and run a standard 2D U-Net on them. Also this will allow full-image analysis with full resolution, because a 2D HD image don't take that much VRAM compared to a 3D volume. It mostly depends on the data set, whether the utilization of the 3D information or a better resolution lead to the best performance.

# Initialize the NIfTI slicer I/O interface and configure the images as one channel
# (grayscale) and three segmentation classes (background, kidney, tumor)
from miscnn.data_loading.interfaces import NIFTIslicer_interface
interface = NIFTIslicer_interface(pattern="case_00[0-9]*", channels=1, classes=3)

Cheers, Dominik

EDIT: Added code examples for each option.

luhc228 commented 4 years ago

Hello, Dominik. Thanks to your guide, I ran the example of the kits19 successfully. But I still have some doubts about the kits19 exampes. After I finished the model training( use the cross_validation), I get the following result:

├── fold_0
 |  ├── history.tsv
 |  ├── model.hdf5
 |  ├── validation.dice_crossentropy.png
 |  ├── validation.dice_soft.png
 |  └── validation.loss.png
├── fold_1
 |  ├── history.tsv
 |  ├── model.hdf5
 |  ├── validation.dice_crossentropy.png
 |  ├── validation.dice_soft.png
 |  └── validation.loss.png
├── fold_2
 |  ├── history.tsv
 |  ├── model.hdf5
 |  ├── validation.dice_crossentropy.png
 |  ├── validation.dice_soft.png
 |  └── validation.loss.png

But in the kits19 sample, it seems that there are visualization cases in the fold_0. For example:

Image(filename = "evaluation/fold_0/visualization.case_case_00044.gif")

I read the usage docs, should I specific like this?

# Predict the segmentation of 20 samples
model.predict(sample_list[0:30])
muellerdo commented 4 years ago

Hi luhc228,

But in the kits19 sample, it seems that there are visualization cases in the fold_0. For example:

Image(filename = "evaluation/fold_0/visualization.case_case_00044.gif")

Because the "run_detailed_evaluation" option in the cross_validation() function is now False, by default, in the newer MIScnn versions. The detailed evaluation is specifically designed only for 3D data, therefore it is more intuitive to disable this option by default in order to run the cross validation on 2D, as well.

Arg, this is now a bit problematic. You have already the fitted models, so all we need to do is performing the detailed validations again. The problem now is, that we don't know which samples were in the training set and which in the validation set for each fold :/

Obviously, you can run the detailed validation on all samples, but we introduce here a bias, because we are predicting on our training data.

...
# Create the Neural Network model
model = Neural_Network(preprocessor=pp, loss=tversky_loss, metrics=[dice_soft, dice_crossentropy],
                       batch_queue_size=3, workers=3, learninig_rate=0.0001)

# Load one of the three models
model.load("evaluation/fold_0/model.hdf5")

# Run detailed validation
from miscnn.evaluation.detailed_validation import detailed_validation
detailed_validation(sample_list, model, "evaluation")

But I would highly recommend running the cross_validation again with the detailed_validation option.

cross_validation(validation_samples, model, k_fold=3, epochs=350, iterations=150,
                 evaluation_path="evaluation", draw_figures=True, callbacks=[cb_lr],
                 run_detailed_evaluation=True)

Cheers, Dominik

luhc228 commented 4 years ago

Thanks, Dominik.