Removes from the lexicon in test and train all words that are not present at least once in the training set.
Could be useful when using a lexicon that is tailored to the corpus to the point of overfitting (i.e. only words occuring in the corpus were included and many other common words weren't), which could lead to overestimated performance on words from the lexicon appearing in the test only.
Removes from the lexicon in test and train all words that are not present at least once in the training set.
Could be useful when using a lexicon that is tailored to the corpus to the point of overfitting (i.e. only words occuring in the corpus were included and many other common words weren't), which could lead to overestimated performance on words from the lexicon appearing in the test only.