biomag-lab / hypocotyl-UNet

MIT License
11 stars 2 forks source link

Validation loss fluctuating #12

Open darvida opened 4 years ago

darvida commented 4 years ago

Hello, my validation loss fluctuates around 0.35 as seen bellow:

https://imgur.com/a/ZQ1aWFy The training loss keeps decreasing but the validation loss seems to fluctuate/ decrease very slow.

Does it seem strange?

cosmic-cortex commented 4 years ago

Hi! Validation loss fluctuation is normal, however it remaining approximately the same is not. Your training loss also decreases much slower than what I experienced (if I recall correctly :) it was ~2 years ago).

You can change the learning rate by setting the --initial_lr in the train.py script. By default it is 1e-3, I would recommend setting it to 1e-2 and gradually increasing by a factor of 10 until you find a value where the losses are decreasing but not fluctuating.

darvida commented 4 years ago

I got it to work on google colab. Do you have any tips on what batch size you used in the training step

cosmic-cortex commented 4 years ago

I have used 4 with 256x256 tiles, but in general, you should use as big of a batch as the GPU memory allows.

darvida commented 4 years ago

Hello, i'm having some problem with overfitting on my grayscale images of Arabidopsis seedlings. I was thinking on counteracting the overfitting by reducing the model parameters. Can you give me any tips on how I would be able to do that in your blocks.py file?

Best regards David

darvida commented 4 years ago

In addition, do you think that a dropout in some of the layers would partially take care of the overfitting problem as the currently used model might have too many parameters for grayscale images, thus not generalizing well.

cosmic-cortex commented 4 years ago

Sorry for the late answer! I have missed the GitHub notification email for some reason :(

If you would like to customize the model, you don't need to modify blocks.py. There are some parameters which are configurable and they should be enough for your purposes.

The training script train.py instantiates the model here: https://github.com/biomag-lab/hypocotyl-UNet/blob/master/src/train.py#L51 The initializer of the UNet object is implemented here: https://github.com/biomag-lab/hypocotyl-UNet/blob/master/src/unet/unet.py#L9-L29

What you need to do is set the conv_depths parameters in the train.py script during the UNet instantiation. By default, its value is conv_depths=[64, 128, 256, 512, 1024]. So, you might try for example

unet = UNet(3, 3, conv_depths=[64, 128, 256])

In this case, you might want to increase the batch size during training, since reducing the parameters also reduces memory footprint.

Regarding the dropout layers, I think that the batch normalization layers and the augmentation are doing a good job avoiding overfitting, so I wouldn't try dropout layers first. If you want to try them out, I suggest putting them into the Encoder2D and Decoder2D blocks, but you should definitely do some research on this first.