Questions about training process

iris0329 commented 4 years ago

Hi, Thanks for your generous opening source of this briliant project!

I have two questions for the project:

Does pretrained model use uncertainty or not during training? Is this pretrained model the one which could reproduce 59.5 point-wise mean-IoU in the Table I of paper?
When I train the model on my self, I found that there is a serious overfitting problem. The pictures below are my training loss and valid loss curve.
I want to ask why this problem occurs? Do you also have this problem during training?

I am looking forward to your reply!

TiagoCortinhal commented 4 years ago

Hello @iris0329

No the pretrained model is not trained with the uncertainty. The uncertainty is applied after training following A General Framework for Uncertainty Estimation in Deep Learning
Could you give me more information about your training? I never encountered this type of issues, no.

iris0329 commented 4 years ago

Thanks for your prompt reply!

So that means, if I apply uncertainty on this pretrained model, I could get 59.5 point-wise mean-IoU in the Table I of paper?
The batch size is set to 32 and I used 8 blocks of TITAN X (Pascal) GPU. All the other Settings are the default Settings. It looks very confusing

I retrained it, and it can be seen that there is a tendency of overfitting at EPOCH 55. Thanks.

TiagoCortinhal commented 4 years ago

Yes, which was evaluated on the test set.
Does that tendency continues if it trains after 55 epochs?

Best,

iris0329 commented 4 years ago

Thank you !
The answer is yes. Now It's nearly 100 epoch, you could see from the validation and training loss figure that overfitting is becoming more serious.

TiagoCortinhal commented 4 years ago

Ok I will check this as soon as possible!

TiagoCortinhal commented 4 years ago

Hello @iris0329! Sorry for the delay on this question. I had to go back to my logs to check how the submission training (and the other experiments went) to check our behaviour.

I mislead you earlier and we had a similar issue with an apparent overfitting. Also our baseline comparison (rangenet++) also shows this apparent sign when we trained it from scratch.

Again, sorry for the delay and misleading answers before, I didn't have in my mind a clear idea of our val_loss plots!

iris0329 commented 3 years ago

Sorry, I thought about this problem before, but forgot to update it here.

I think it is a problem caused by Cross-Entropy Loss.

Think about the Matrix_A before argmax function, its size is (h, w, class_num). After applying the argmax function on this Matrix_A, we got a new Matrixd_B and its size is (h, w, 1).

The increment IoU means that the Matrix_B becomes more accurate. However, the Cross-Entropy Loss is computed based on Matrix_A.

For example,

There are two pixels and ground truth is [2, 0]. At the beginning, Matrix_A is [[0.1, 0.2, 0.7], [0.1, 0.2, 0.7]], after applying the argmax function, we got the prediction [2, 2], 50% accuracy rate.

With the training process continuing, the Matrix_A becomes to [[0.2, 0.3, 0.5], [0.5, 0.3, 0.2]] after applying the argmax function, the prediction is [2, 0], the loss will increase but the accuracy is higher.

So it seems, as the training progressed, there was a tendency for the predictions to average out across all categories. But I still don't know how to solve it. If anyone has a suggestion about it, I would be very grateful.

TiagoCortinhal / SalsaNext

Questions about training process #32