Closed yhamoudi closed 9 years ago
We could compute the Levenshtein distance and reject the corrected output if the distance is greater than 5 (for example)
Very strange:
Who is the author of “Le Petit Prince”?
returns 2 answersWho is the author of "Le Petit Prince"?
returns only the good answerWho is the author of “Le Petit Prince"?
returns no answerIt depends on the quotation marks. (perhaps there is also a problem in the question parsing)
There is a Python implementation of the levenshtein distance here: https://github.com/ProgVal/Limnoria/blob/master/src/utils/str.py#L69 (replace xrange
with range
because the code I linked if written on Python 2)
(and it would be nice to have cache on it, since it's quadratic in the number of letters, like functools.lru_cache
I suggested the other day)
Simple: the only quotation mark I consider is "
(also the case in QuestionParsing-Grammatical).
Who is the author of “Le Petit Prince”?
is corrected toWho is the author of “it”?
. However,(it,author,?)
returnsStephen King
(it
appears in notable works here: https://www.wikidata.org/wiki/Q39829).We should not allow the spell checker to perform such big corrections (Le Petit Prince !=!=!= it)