There's no probabilistic motivation for doing this. However, for a couple of real life cases we've seen it helps a lot. Essentially, if the outcome distribution is very skewed, unseen words heavily favor the most rare outcomes, which lead to nonsensical predictions.
There's no probabilistic motivation for doing this. However, for a couple of real life cases we've seen it helps a lot. Essentially, if the outcome distribution is very skewed, unseen words heavily favor the most rare outcomes, which lead to nonsensical predictions.