HenriquesLab / ZeroCostDL4Mic

ZeroCostDL4Mic: A Google Colab based no-cost toolbox to explore Deep-Learning in Microscopy
MIT License
553 stars 129 forks source link

U-net network training error #15

Closed ilobb closed 4 years ago

ilobb commented 4 years ago

Hello!

As per my previous issues, I am now trying to use U-net to identify nuclei from brightfield microscopy images.

I made nuclear outlines using CellProfiler (data examples attached), converting to .png and numbered 0-15 (16 images, png and nomenclature to try and mirror the example data as much as possible).

0 (1) 0

However, when trying to train the network I get the following error:


WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4479: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4267: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2239: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3657: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where Model: "model_1"


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 1040, 1040, 1 0


conv2d_1 (Conv2D) (None, 1040, 1040, 6 640 input_1[0][0]


conv2d_2 (Conv2D) (None, 1040, 1040, 6 36928 conv2d_1[0][0]


max_pooling2d_1 (MaxPooling2D) (None, 520, 520, 64) 0 conv2d_2[0][0]


conv2d_3 (Conv2D) (None, 520, 520, 128 73856 max_pooling2d_1[0][0]


conv2d_4 (Conv2D) (None, 520, 520, 128 147584 conv2d_3[0][0]


max_pooling2d_2 (MaxPooling2D) (None, 260, 260, 128 0 conv2d_4[0][0]


conv2d_5 (Conv2D) (None, 260, 260, 256 295168 max_pooling2d_2[0][0]


conv2d_6 (Conv2D) (None, 260, 260, 256 590080 conv2d_5[0][0]


up_sampling2d_1 (UpSampling2D) (None, 520, 520, 256 0 conv2d_6[0][0]


conv2d_7 (Conv2D) (None, 520, 520, 128 131200 up_sampling2d_1[0][0]


concatenate_1 (Concatenate) (None, 520, 520, 256 0 conv2d_4[0][0]
conv2d_7[0][0]


conv2d_8 (Conv2D) (None, 520, 520, 128 295040 concatenate_1[0][0]


up_sampling2d_2 (UpSampling2D) (None, 1040, 1040, 1 0 conv2d_8[0][0]


conv2d_9 (Conv2D) (None, 1040, 1040, 6 32832 up_sampling2d_2[0][0]


concatenate_2 (Concatenate) (None, 1040, 1040, 1 0 conv2d_2[0][0]
conv2d_9[0][0]


conv2d_10 (Conv2D) (None, 1040, 1040, 6 73792 concatenate_2[0][0]


conv2d_11 (Conv2D) (None, 1040, 1040, 1 65 conv2d_10[0][0]

Total params: 1,677,185 Trainable params: 1,677,185 Non-trainable params: 0


None WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1033: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1020: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3005: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

Epoch 1/200 Found 15 images belonging to 1 classes. Found 1 images belonging to 1 classes. Found 1 images belonging to 1 classes. Found 15 images belonging to 1 classes. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:197: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:207: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:216: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:223: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.


ResourceExhaustedError Traceback (most recent call last)

in () 11 12 csv_log = CSVLogger(model_path+'/'+model_name+'/Quality Control/'+model_name+'_training.csv', separator=',', append=False) ---> 13 history = model.fit_generator(Generator,steps_per_epoch=steps,epochs=epochs, callbacks=[model_checkpoint,csv_log], validation_data=val_Generator, validation_steps=3, shuffle=True, verbose=1) 14 15 6 frames /tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs) 1470 ret = tf_session.TF_SessionRunCallable(self._session._session, 1471 self._handle, args, -> 1472 run_metadata_ptr) 1473 if run_metadata: 1474 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) ResourceExhaustedError: OOM when allocating tensor with shape[4,64,1040,1040] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node training/Adam/gradients/conv2d_11/convolution_grad/Conv2DBackpropInput}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. ********************************************************************************************************* Apologies for the lack of screenshot, but the error would not entirely fit on my screen. The training parameters are as below: ![image](https://user-images.githubusercontent.com/62608092/81308640-eb384f00-9079-11ea-82fe-a51a3bb26dcc.png) Desktop: - OS: Windows 10 Enterprise 64-bit - Browser: Chrome Cheers, Ian
NuriaTaberner commented 4 years ago

Hello Ian,

I'm new and so far I have only run the sample. I had to safe the images as tiff instead of png to get it working properly.

guijacquemet commented 4 years ago

Hello Ian and NuriaTaberner, Thanks a lot for reaching out! Is using tiff instead of PNG solve your issue? Also, while outlines may also work, I think will get better results if you masked the whole area that you want to threshold instead. Cheers Guillaume

ilobb commented 4 years ago

Hello, I had tried tiffs previously but it failed. However that may also have been due to not noticing that the step size needs to be equivalent to the number of samples in the training set divided by the batch size, which I late corrected for the png files. I will try again with tiff files, and also a mask rather than an outline, and get back to you. Thanks very much, Ian

ilobb commented 4 years ago

Hello again, So I tried with tiffs, and using a mask rather than an outline. An example of which is below. I also tried with the image inverted and got a similar error to that described above on each occasion.

1 . Apologies, but not being Python literature it is a little difficult for me to comprehend the nature of the problem. Any explanations or suggestions?

Thanks very much,

Ian

guijacquemet commented 4 years ago

Dear Ian,

Thanks for testing all of these options. We found a bug in the Unet notebook that will cause issues when the input image is not a square. This will be fixed with the new release. Since your input is a rectangle I am wondering if this could be the issue. You could try to crop your image into a square and see if the problem disappear.

ilobb commented 4 years ago

Thanks for the tip!

I cropped the images from 1392 pixels x 1040 pixels to 1040 x 1040, but I still get pretty much the same error message as previously.

VID600Stain$ VID600Stain$

Cheers, Ian

lucpaul commented 4 years ago

Hi Ian,

ResourceExhaustedError suggests that the file size that is loaded into the network is too big. So you could try to crop the images to a slightly smaller size, in our training data we use 512x512. Or you can also try to reduce the batch size in the Advanced Parameter settings. Go to something low like 1 or 2 and see if you get the error again.

I hope that fixes it but if the error still occurs, perhaps you could share an example image (input+output) with us so we can have a look at the data?

NuriaTaberner commented 4 years ago

I had the same problem. Scaling to 640x640 allowed me to run properly the model with batch size of 4 and get quite a nice prediction of what I wanted. Maybe in the future I will consider running it in my computer, though.

ilobb commented 4 years ago

I took your suggestion and cropped the image further down to 512 x 512, and the training processed without a problem. Thanks very much!

Just out of curiosity, will this mean that any future tests that I run with this model should have the same dimensions?

Romain-Laine commented 4 years ago

Der everyone, we made a new release yesterday with a number of performance improvement on U-net notebook. It should be able to take images of any sizes now and also any bit depth. So this may fix a number of issues that you guys describe here ! So I suggest you upgrade to the new release and re-run your tests on your original images. Generally, it will be better to avoid converting images to PNG as you may lose some dynamic range and data representation along the way. I hope this helps ! Romain

ilobb commented 4 years ago

Thanks very much for your help Romain, and everybody else. And congrats on the new release!