Open avashishta5 opened 3 years ago
[ ] Add POS Tagging to exclude nouns from lemmatization and for better sanitization.
[ ] Replace regular Levenshtein distance with a Levenshtein Automaton + Jaro-Winkler Distance based approach.
[ ] Replace TinyDB with regular JSON or some alternate DS (Tries, maybe?).
[ ] As an addition to the previous point, see if Dice's Coefficient can be removed.
[ ] Fix speed (Getting rid of TinyDB might help).
[ ] Scrape the web for more lemmas.
https://julesjacobs.github.io/2015/06/17/disqus-levenshtein-simple-and-fast.html
http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata
[ ] Add POS Tagging to exclude nouns from lemmatization and for better sanitization.
[ ] Replace regular Levenshtein distance with a Levenshtein Automaton + Jaro-Winkler Distance based approach.
[ ] Replace TinyDB with regular JSON or some alternate DS (Tries, maybe?).
[ ] As an addition to the previous point, see if Dice's Coefficient can be removed.
[ ] Fix speed (Getting rid of TinyDB might help).
[ ] Scrape the web for more lemmas.