Open arglog opened 4 years ago
One difference I found is that in logits_and_loss
the cross entropy was normalized by the length of targets
, while in neg_log_perplexity
the cross entropy was normalized by the number of non-zero tokens in targets
.
Hi
In the perplexity_eval model (using
perplexity_eval.gin
), there are two metrics being recorded:loss
(L765) andneg_log_perplexity
(L802).I found these two losses are essentially cross entropy loss:
loss
was computed from here andneg_log_perplexity
was based on values computed here.My question is: what's the difference between
loss
andneg_log_perplexity
? In my experiments I found thatloss
is not equal to-1 * neg_log_perplexity
. What's the reason that makes the two differenct?Thanks!