bootphon / abkhazia

ABX and kaldi experiments on speech corpora made easy
https://docs.cognitive-ml.fr/abkhazia
GNU General Public License v3.0
31 stars 6 forks source link

--prune-lexicon option in abkhazia language #10

Open mmmaat opened 7 years ago

mmmaat commented 7 years ago

Removes from the lexicon in test and train all words that are not present at least once in the training set.

Could be useful when using a lexicon that is tailored to the corpus to the point of overfitting (i.e. only words occuring in the corpus were included and many other common words weren't), which could lead to overestimated performance on words from the lexicon appearing in the test only.