HQDerek / bot

The bot called Derek: Solvers for parsing questions and searching for answers.
1 stars 0 forks source link

Stemming #8

Open DeanSherwin opened 6 years ago

DeanSherwin commented 6 years ago

Just putting this here so it doesn't get forgotten. We should look into the process of 'Stemming' which remove prefixes and suffixes from words eg. 'ing'. 'ed' etc....

DeanSherwin commented 6 years ago

Whoosh search engine does this

DeanSherwin commented 6 years ago

stemming algorithms such as Porter and Porter2, Paice Husk, and Lovins.

These are available in the Whoosh source code.

DeanSherwin commented 6 years ago

Lemmatization is a similar, less forceful process.