Closed bodritto closed 11 years ago
Gosha, pleae fix this by adding a simple heuristic: if there are 2 dictionary words with the same levenshtein distance from the input the one shorter than 4 glyphs should never be chosen.
with this heuristic if input word is "nes" (typo of brand NEC, in dictionary from entity), it becomes nest (dictionary word) instead of nec (also from dictionary, but length < 4)
words shorter than 4 characters are not corrected at all so this is not a problem
Igor is referrign to the case when we for some reason have an external dictionary with random words. just keep that case in mind (i dont think it's currently relevant but it's soemthing to remember) Gosha, when are you planning to do this? Please indicate in the ticket
Done in issue_526_fsc_threeglyphs
I've decided to implement a simpler and more efficient heuristic, just preferring longer words. This should be fine in pretty every case I could think of.
i merged this into issue_527_mkERC_list (very serious low level change to the way we handle functional translations) - so this should go into production tomorrow (i will merge it myself)
exz: boch -> boc (instead of bosch)