Bookworm-project / BookwormDB

Tools for text tokenization and encoding
MIT License
84 stars 12 forks source link

Add English Stemming #12

Open bmschmidt opened 12 years ago

bmschmidt commented 12 years ago

Currently, we allow collation by case, but can't group together plurals and forms of verbs. There's code somewhere to implement a Porter stemmer to fill the field "stem" in the words table--that should be done at some point during database import, so that users can run more complicated queries.

bmschmidt commented 12 years ago

OK, the code is written, I just need to push it up and we need to include.

It also adds a new dependency: to the NLTK package for the Porter stemming algorithm.