Closed Contextualist closed 2 years ago
Hi, yeah definitely train until the WER/PER are saturated. In my experience it is common for the validation loss to go up while other metrics tend to improve (also for language generation tasks etc.).
Got it, thanks for your explanation, @cschaefer26!
When I am training a forward transformer model, I observed that after the validation loss started to rise, PER and WER kept descending.
My training config is based on the example forward transformer config, with the
phoneme_symbols
modified (Phonemes are in ARPABET and vowels have stress marks) anddropout
set to 0.3.Should I keep training or should I use the model with the lowest validation loss? Or any other suggestion?