Performance of resnet101

kleinzcy commented 4 years ago

Hi, authors, thanks to your nice paper and code!

Recently, I retrain the resnet101 model on your code. But my result is not as good as reported in the paper. I have read the issue but did not find any helpful information.

My environment: Ubuntu16.04, CUDA10.1, pytorch1.3.0, four TITAN XP GPU

My results(NoBRS, last checkpoints, NFL):

results after f-BRS-B:

And, the training curve is strange:

The training loss is growing or constant(change slightly). Do you have any idea?

Thanks.

kleinzcy commented 4 years ago

When I retrain the resnet101 model on GTX 1080 Ti, the result is as good as reported in the paper and even better.

The NoC metric is sensitive.

MaitaYuki commented 3 years ago

@kleinzcy , hope you do not mind I ask you a question since I encountered the same problem you mentioned in the beginning: during training, the training and validation loss do not change much from epoch 1 to epoch 120, fluctuate around 0.3 as shown below,

(INFO) 2021-02-09 06:50:46: Epoch 99, training loss 0.319589: 96%|########################## | 512/531 [05:30<00:12, 1.56it/s] (INFO) 2021-02-09 06:50:51: Epoch 99, training loss 0.319661: 98%|##########################4| 520/531 [05:35<00:07, 1.56it/s] (INFO) 2021-02-09 06:50:57: Epoch 99, training loss 0.319605: 99%|##########################8| 528/531 [05:40<00:01, 1.55it/s] (INFO) 2021-02-09 06:50:58: Save checkpoint to experiments/sbd/r34_dh128/008_first-try/checkpoints/last_checkpoint.pth (INFO) 2021-02-09 06:51:02: Epoch 99, validation loss: 0.339068: 7%|#6 | 12/178 [00:03<00:32, 5.03it/s] (INFO) 2021-02-09 06:51:07: Epoch 99, validation loss: 0.341119: 21%|#####3 | 38/178 [00:08<00:27, 5.15it/s] (INFO) 2021-02-09 06:51:12: Epoch 99, validation loss: 0.341862: 36%|########9 | 64/178 [00:13<00:22, 5.14it/s]

When you reported your result is as good as reported in the paper and even better, did you observe the loss reduced below 0.3 a lot? How did you solve the problem? Thanks.

kleinzcy commented 3 years ago

@MaitaYuki Sorry for the late reply. I am on a long holiday.

Loss fluctuates around 0.3 because of normalized focal loss. You can look at its formulation.

Looottch commented 3 years ago

@kleinzcy @MaitaYuki Did you remember how many time will 120 epoch takes (and how many gpus you used)

SamsungLabs / fbrs_interactive_segmentation

Performance of resnet101 #25