fartashf / vsepp

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
Apache License 2.0
489 stars 125 forks source link

Loss stuck, not decreasing #27

Closed mesnico closed 4 years ago

mesnico commented 4 years ago

Hi, I'm noticing a very strange loss behavior during the training phase. Initially, the loss decreases as it should be. At a certain point, it reaches a plateau from which most of the times cannot escape. In particular, if I use pre-extracted features without fine-tuning the image encoder, the plateau is overtaken quite immediately, as show in the following plot: image

However, if I try to fine-tune, the loss get stuck forever: image

I noticed that the loss stuck on a very specific value, that is 2 (batch_size loss_margin). It seems that the loss is collapsing to values where the difference between positive and negative pair similarities is always 0: equation and equation

I'm using margin = 0.2. For the pre-extracted features I used a batch size = 128, while for the fine-tuning the batch size = 32. The configuration is the very same as yours. In general I noticed this behavior happening when the network is too complex. Maybe the reason is that good hard negatives cannot be found if I use batch sizes less than 128. However, I have hardware constraints.

Did you notice a similar behavior in your experiments? If so, how did you solve? Thank you very much

fartashf commented 4 years ago

Thanks for reporting. Generally, we observed that MAX loss can be harder to optimize. There are ways to reduce the difficulty:

mesnico commented 4 years ago

Thank you very much for these hints. Actually, I think that the stage-wise optimization is the way to go. If I first optimize using SUM loss and then I resume, after 10 epochs, using MAX loss, the problem disappears and the validation metrics keep increasing smoothly.

However, I will pay attention also to the batch size, as you suggested. Thanks again