Arrays type float64 after running subfunctions

MLRadfys commented 3 years ago

Hi Dominik,

I just noticed that both images and segmentations are of type float64 after running any subfunction. That might lead to memory issues during training and the batches take a lot of space when dumped to disk.

Maybe two simple lines after running the subfunctions would do the job:

sample.img_data = np.array(sample.img_data, dtype=np.float16) sample.seg_data = np.array(sample.seg_data, dtype=np.uint8)

Cheers, Michael

muellerdo commented 3 years ago

Hey Michael,

good idea!

For the segmentation, it is clear that no one will have more than 127 classes for their segmentation.

But for the imaging, I guess it could be a bit tricky.
Check out this: https://stackoverflow.com/questions/46613748/float16-vs-float32-for-convolutional-neural-networks

Out of the box, I would guess that Tensorflow automatically transforms them to float64 at the Numpy -> Tensor conversion. But I'm not quite sure. It could also be that they just reuse the associated array type, so Numpy-Float16 -> Tensor-Float16?

I implemented it as float32 right now, but I will read the Tensorflow doc and give you some feedback when I know more. If the Tensors are float64 anyway, then it should be no problem to store the preprocessed samples as float16 for further file size reduction.

As always, thank you for your feedback & contributions.

Cheers, Dominik

Tasks

[x] Added suggested array type conversion
[x] Tested locally and on TravisCI node
[x] Merged dev branch into Master
[x] Release new PyPI version

Related Commits: e31df8875da1157c4ccda5144f9ce3d5ce298f24

muellerdo commented 3 years ago

Like suspected, Tensorflow reuse the associated array type from NumPy. This means a NumPy Float16 array will be a Float16 Tensor.

Therefore, I think it makes sense to use Float32 for the imaging data for avoiding later possible trouble during the model training.

Here is a little example to check out the NumPy to Tensor Conversion, credits for the code goes to @j-frei.

import numpy as np
import tensorflow as tf

float16_np = np.empty(shape=(1,), dtype=np.float16)
float32_np = np.empty(shape=(1,), dtype=np.float32)
float64_np = np.empty(shape=(1,), dtype=np.float64)
float_np = np.empty(shape=(1,), dtype=np.float)

float16_tf = tf.convert_to_tensor(float16_np)
float32_tf = tf.convert_to_tensor(float32_np)
float64_tf = tf.convert_to_tensor(float64_np)
float_tf = tf.convert_to_tensor(float_np)
float_tf_forced32 = tf.convert_to_tensor(float_np, dtype=tf.float32)

print(float16_tf.dtype)
print(float32_tf.dtype)
print(float64_tf.dtype)
print(float_tf.dtype)
print(float_tf_forced32.dtype)

Cheers, Dominik

frankkramer-lab / MIScnn

Arrays type float64 after running subfunctions #45

Tasks