Open jhoetter opened 1 year ago
Interesting idea. Do you already have some resources on how to tackle this?
Only what I've written above :D
pyspellchecker implements this idea.
Just skimmed these, seems to use only algos no models.
EDIT: Uses Levenshtein distance in fact.
Nice, that is great. I don't know yet if pyspellchecker supports different languages like e.g. German, but we can use this for English at least I guess. For German, Swedish, French, ... I think an approach using the language vocabulary would be helpful
Please describe the module you would like to add to bricks Turn "t3st docunnent" into "test document"
Do you already have an implementation? No, but statistical distribution could be relevant for this. Or by looking up if the word occurs in a vocabulary (e.g. via nltk, ...), and if not, searching for the word with the smallest Levenshtein distance.
Additional context Tried this many years ago already, and there are simple but effective approaches for this.