as-ideas / DeepPhonemizer

Grapheme to phoneme conversion with deep learning.
MIT License
352 stars 38 forks source link

Question on overfitting #18

Closed Contextualist closed 2 years ago

Contextualist commented 2 years ago

When I am training a forward transformer model, I observed that after the validation loss started to rise, PER and WER kept descending.

validate

My training config is based on the example forward transformer config, with the phoneme_symbols modified (Phonemes are in ARPABET and vowels have stress marks) and dropout set to 0.3.

Should I keep training or should I use the model with the lowest validation loss? Or any other suggestion?

cschaefer26 commented 2 years ago

Hi, yeah definitely train until the WER/PER are saturated. In my experience it is common for the validation loss to go up while other metrics tend to improve (also for language generation tasks etc.).

Contextualist commented 2 years ago

Got it, thanks for your explanation, @cschaefer26!