Open nprisbrey opened 2 months ago
We need to be consistently logging the same values for all types of models. For the LongNet architecture, for example, we only log Validation loss whereas we record Validation loss and perplexity for the Transformer models and RetNet models.
Also, why are we only recording perplexity for Validation? I'd like to see loss and perplexity recorded for all models for all datasets.
Also, why is validation perplexity only logged on_epoch and not also on_step just like everything else?
on_epoch
on_step
We need to be consistently logging the same values for all types of models. For the LongNet architecture, for example, we only log Validation loss whereas we record Validation loss and perplexity for the Transformer models and RetNet models.
Also, why are we only recording perplexity for Validation? I'd like to see loss and perplexity recorded for all models for all datasets.