Open fingoldo opened 3 years ago
Hi!
It would complicate the logic a bit, but it's possible.
This would require adding a function generating these splits in https://github.com/fsondej/autocorrect/blob/master/autocorrect/typos.py
and in https://github.com/fsondej/autocorrect/blob/master/autocorrect/__init__.py assigning scores to those splits, for example as min(score_word1, score_word2)
.
Also, I fear that this splitting would happen too often, for example
ashe -> as he
instead of ashes
anso -> an so
instead of also
This would require some calibration, for example downscoring short words, which further complicates things. Also maybe switching off double typos correction would be necessary when using these splits.
I don't have time to add this feature, but I would happily merge a PR with it, if the score in tests increases.
Thanks for this wonderful lib!
Can you add some functionality to detect accidentally merged words, for example, when a whitespace (separating words apart) was omitted?
It would be cool if 'testproject' could produce correct candidates: 'test' and 'project' How hard is it to add such a feature?