U-net ResourceExhaust and Infinity

SHaensch commented 4 years ago

Unfortunately I am running into problems using the U-Net notebook. I want to see if I can train U-net to the point where it recognizes certain structures in EM images (similar to the training-dataset, which existed as a test). As input I am using the grayscale EM images and corresponding binary masks (marking a roughly circular structure in the middle of the image). Either it is throwing “ExhaustResources”-Errors from the training cell, or each of the epochs results is “val-loss did not improve from inf”. Both fails to generate a model. The Exhaust resource I can circumvent by reducing the number of training files / cropping and reducing the batch_size, but then I am really using low amounts of training files as input (eg. 4x images ~900 by 900 px which are already just a crop of the original files). I thought maybe the input is not in the right format, but I tried 16-bit, 8-bit and RGB for the raw Electron microscopy files and also 8-bit binary masks (both .tif) already. Is 900x900 still to big and this has to be splitted in files of ~512x512 (original files are 2048x2048). The testfiles I got once were even in a .png format, so I tried this as well, resulting in the “val-loss did not improve from inf”. Does it mean, that the input data are not specific enough / the structure is not really distinguishable from the background, so the model is not able to indetify these regions? At that moment I am quite stuck and do not know exactly what is the problem and how to improve.

Fastander commented 4 years ago

I have a similar issue with a dataset I used to train locally with good results (although in .png) but after converting to .tif and running with your UNET notebook I get “val-loss did not improve from inf”. It would be great to figure what is the issue the conversion to tif, the validation split or something else.

lucpaul commented 4 years ago

Hi, thanks for raising this issue. We are aware of this and are addressing this in the next release. What you could try to make it work in the meantime is as follows:

change the number of pooling steps. If the images are very large, try a higher value. It might help the network make sense of the data.
reduce the augmentation. For testing, you could just set everything to 0. In my experience, this has helped me address the infinite val loss problem. If it works better you could then gradually add some augmentation again.
change the initial learning rate. If the val loss doesn't go down from inf, maybe raise the initial learning rate slightly. But I would try the other two options first as I haven't had to try this one myself.

I hope this helps. Apologies for the inconvenience. If none of these work, watch this space for the next release, coming very soon. Thanks for trying our notebook.

SHaensch commented 4 years ago

Thanks for the reply, I will definetly give it a try, but where exactly can I change the initial learning rate / the pooling steps? The augmentation I see as an option in the notebook itself.

lucpaul commented 4 years ago

To change those parameters have a look here, it's under the advanced parameters in 3.1.

Romain-Laine commented 4 years ago

Hi guys, the new release that we pushed last week should fix your problems now. If you have a try and let us know how you get on, it's always useful for us to get feedback ! Cheers,

Romain

HenriquesLab / ZeroCostDL4Mic

U-net ResourceExhaust and Infinity #19