The loss decreases very slowly

RElbers / region-mutual-information-pytorch

PyTorch implementation of the Region Mutual Information Loss for Semantic Segmentation.

MIT License

25 stars 1 forks source link

The loss decreases very slowly #1

Closed rose-jinyang closed 3 years ago

rose-jinyang commented 3 years ago

Hello How are you? Thanks for contributing to this project. I am training a portrait segmentation model with this RMI loss on a tiny dataset that contains 3179 samples. The input size of the network is 160x160. First, I used the SGD optimizer but the loss decreased very slowly. So now I am training a model with Adam optimizer(init_lr=0.001) and PolynominalLRDecay scheduler. But the loss still decreases very slowly.

epoch 1: val_loss 1.2528 epoch 99: val_loss 0.9809 epoch 182: val_loss 0.9635

What do u think the reason is? Thanks

RElbers commented 3 years ago

It's hard to say why. The value of the RMI loss is not as interpretable as other loss functions, so it would be good to also calculate the IoU or Dice score every epoch to measure the performance of the model.

It can also be that with learning rate decay the learning rate is simply too low in the later epochs to properly optimize the model. So perhaps try with the Adam optimizer without lr-decay and start with a lower initial lr (0.0003 for example) to compensate for the lack of lr-decay.

rose-jinyang commented 3 years ago

Thanks for your reply. What about the relationship between the parameter downsampling_method and the input size of a model? When should the "region-extraction" vaule of the parameter "downsampling_method" be used?

RElbers commented 3 years ago

The downsampling is mainly done to reduce memory usage, but a disadvantage of max- and average-pooling is that fine-grained details can be lost in the process. As for using the 'region-extraction' option, It's just a different way of doing this. With this option, there is no pooling operation, and the regions that RMI considers are strided (a bit like strided convolutions). As there is no pooling layer this might help preserve these finer details that would otherwise be lost by pooling. But I don't have the resources to compare how well that does v.s. the other methods, so that is just a guess. What method works best with what stride etc. depends on your specific problem. Unless you want try out every option to see what works best, you should probably stick with the default options.