google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
https://arxiv.org/abs/1910.10683
Apache License 2.0
6.11k stars 751 forks source link

Difference between "loss" and "neg_log_perplexity" in "perplexity_eval" mode #356

Open arglog opened 4 years ago

arglog commented 4 years ago

Hi

In the perplexity_eval model (using perplexity_eval.gin), there are two metrics being recorded: loss (L765) and neg_log_perplexity (L802).

I found these two losses are essentially cross entropy loss: loss was computed from here and neg_log_perplexity was based on values computed here.

My question is: what's the difference between loss and neg_log_perplexity? In my experiments I found that loss is not equal to -1 * neg_log_perplexity. What's the reason that makes the two differenct?

Thanks!

arglog commented 4 years ago

One difference I found is that in logits_and_loss the cross entropy was normalized by the length of targets, while in neg_log_perplexity the cross entropy was normalized by the number of non-zero tokens in targets.