Closed AndrejHafner closed 4 years ago
I have the same problem! I have tried to use HrNetW18+C1 according to the settings in https://github.com/HRNet/HRNet-Semantic-Segmentation/blob/master/experiments/cityscapes/seg_hrnet_w18_small_v2_512x1024_sgd_lr1e-2_wd5e-4_bs_12_epoch484.yaml. The problem solved when I applied for this lighter model setting. But it still does not work in the original setting.
I think I solve this problem by canceling the comment line 106 in ./models/model.py. Since I am training the model from scratch, I didn't enable the [pretrain] setting.net_encoder.apply(ModelBuilder.weights_init)
I later solved the problem by lowering the learning rate to 0,00002
and learning on a GPU with more VRAM, which enabled me to increase the batch size. This increases the stability while learning. I had these problems on Nvidia GTX 980 Ti with 6GB VRAM, but they didn't appear on Nvidia P100 with 16GB of VRAM.
Hello!
I have a dataset of images of labels with two classes. I have checked that the labels have correct labels (values 1 and 2). When training a NN with HRNetV2 for encoder and C1 for decoder I start getting loss of nan at the end of first epoch. After that it doesn't correct and the predictions are unusable. I have tried reducing learning rate to 1e-7, but I still get the same problem. I had this problem with other encoder-decoder combinations, but it usually started way later and lowering the learning rate pretty much solved it (with resnet101 + upernet i get mIoU of 0.92).
Here is my config:
Did anyone else face this problem with any of the combinations?
Thank you!
EDIT: I'm having the same problems with resnet101dilated + ppm_deepsup combinations, only that it starts later.