After training the caption models "show, attend, and tell" and "top-down" for 10 EPOCH's, my achieved training loss (~2.8) is still much higher than the suggested upper bound in the lab handout (2.3.) On the other hand as a benchmark I tried "show and tell" model, and the loss is equal to ~2.8, too. Is the recommended upper bound also applicable to "show and tell" model? Since "show and tell" model has no "TODO" part in the code, I assume the loss value should not be affected by the code I added.
The upper bound is for the minimum training loss, and the training loss around 2.6 is usual.
The performance of "show and tell" model is a little worse than models with attention.
Dear TR,
After training the caption models "show, attend, and tell" and "top-down" for 10 EPOCH's, my achieved training loss (~2.8) is still much higher than the suggested upper bound in the lab handout (2.3.) On the other hand as a benchmark I tried "show and tell" model, and the loss is equal to ~2.8, too. Is the recommended upper bound also applicable to "show and tell" model? Since "show and tell" model has no "TODO" part in the code, I assume the loss value should not be affected by the code I added.