barzerman / barzer

barzer engine code
MIT License
2 stars 0 forks source link

BZSPELL: barzer doesn't correct short words to 3-letter words #526

Closed bodritto closed 11 years ago

bodritto commented 11 years ago

exz: boch -> boc (instead of bosch)

barzerman commented 11 years ago

Gosha, pleae fix this by adding a simple heuristic: if there are 2 dictionary words with the same levenshtein distance from the input the one shorter than 4 glyphs should never be chosen.

bodritto commented 11 years ago

with this heuristic if input word is "nes" (typo of brand NEC, in dictionary from entity), it becomes nest (dictionary word) instead of nec (also from dictionary, but length < 4)

barzerman commented 11 years ago

words shorter than 4 characters are not corrected at all so this is not a problem

barzerman commented 11 years ago

Igor is referrign to the case when we for some reason have an external dictionary with random words. just keep that case in mind (i dont think it's currently relevant but it's soemthing to remember) Gosha, when are you planning to do this? Please indicate in the ticket

0xd34df00d commented 11 years ago

Done in issue_526_fsc_threeglyphs

I've decided to implement a simpler and more efficient heuristic, just preferring longer words. This should be fine in pretty every case I could think of.

barzerman commented 11 years ago

i merged this into issue_527_mkERC_list (very serious low level change to the way we handle functional translations) - so this should go into production tomorrow (i will merge it myself)

barzerman commented 11 years ago

https://github.com/barzerman/barzer/issues/527