Difference between "loss" and "neg_log_perplexity" in "perplexity_eval" mode

google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Apache License 2.0

6.11k stars 751 forks source link

In the perplexity_eval model (using perplexity_eval.gin), there are two metrics being recorded: loss (L765) and neg_log_perplexity (L802).

I found these two losses are essentially cross entropy loss: loss was computed from here and neg_log_perplexity was based on values computed here.

My question is: what's the difference between loss and neg_log_perplexity? In my experiments I found that loss is not equal to -1 * neg_log_perplexity. What's the reason that makes the two differenct?

Thanks!

google-research / text-to-text-transfer-transformer

Difference between "loss" and "neg_log_perplexity" in "perplexity_eval" mode #356