ProjetPP / PPP-Spell-Checker

Spell checking module for the PPP
MIT License
2 stars 2 forks source link

Too big correction #5

Closed yhamoudi closed 9 years ago

yhamoudi commented 9 years ago

Who is the author of “Le Petit Prince”? is corrected to Who is the author of “it”?. However, (it,author,?) returns Stephen King (it appears in notable works here: https://www.wikidata.org/wiki/Q39829).

We should not allow the spell checker to perform such big corrections (Le Petit Prince !=!=!= it)

yhamoudi commented 9 years ago

We could compute the Levenshtein distance and reject the corrected output if the distance is greater than 5 (for example)

yhamoudi commented 9 years ago

Very strange:

It depends on the quotation marks. (perhaps there is also a problem in the question parsing)

progval commented 9 years ago

There is a Python implementation of the levenshtein distance here: https://github.com/ProgVal/Limnoria/blob/master/src/utils/str.py#L69 (replace xrange with range because the code I linked if written on Python 2) (and it would be nice to have cache on it, since it's quadratic in the number of letters, like functools.lru_cache I suggested the other day)

Ezibenroc commented 9 years ago

Simple: the only quotation mark I consider is " (also the case in QuestionParsing-Grammatical).