I wrote a Python script to check the output of pattern's implementation of the Porter2 stemmer (in the vector module) against the output of the original implementation by Martin Porter.
Martin Porter provides a test input vocabulary of 29417 words and corresponding stemmed outputs of these words obtained from his implementation of the stemmer. My script compares the output of pattern's own Porter stemmer implementation with the output of the original implementation. A total of 215 errors were found. These errors are stored in the file errors.txt by my script available here. Sample preview:
word_input
original_output
pattern_output
aimlessly
aimless
aimlessli
gazelle
gazell
gazel
narratives
narrat
narr
Pattern implements the Porter stemmer in the vector module which can be used by first importing, from pattern.vector import stem, PORTER, and then running stem(input, stemmer=PORTER). My code is available here: https://github.com/ni9elf/PatternClipsExperiments
I wrote a Python script to check the output of pattern's implementation of the Porter2 stemmer (in the vector module) against the output of the original implementation by Martin Porter.
Martin Porter provides a test input vocabulary of
29417
words and corresponding stemmed outputs of these words obtained from his implementation of the stemmer. My script compares the output of pattern's own Porter stemmer implementation with the output of the original implementation. A total of215
errors were found. These errors are stored in the file errors.txt by my script available here. Sample preview:Pattern implements the Porter stemmer in the vector module which can be used by first importing,
from pattern.vector import stem, PORTER
, and then runningstem(input, stemmer=PORTER)
. My code is available here: https://github.com/ni9elf/PatternClipsExperiments