The loss value during training

I trained the model from scratch by using the provided traning dataset. During the training, I found that the loss value is always large and it seems that the training process can not converge. However, when I use the trained model to inference, it could produce satisfactory results. I am quite confused about this. Has anyone experienced this? Is this reasonable? If so, what could be the reasons?

Li-Chongyi / Zero-DCE

The loss value during training #42