Given a large typos list, one could imagine making an error model of it + a simple Levenshtein 1 edit distance thing on top of it.
Needs to be tested for:
speed
memory/disk size
correction performance
If we find that it works well given a typos list of X entries, we could build it automatically if typos file ≥ X.
Main benefit: since we already collect typos, it would be an easy way to build an error model that would correct most typos without us having to do any work.
Just an idea:
Given a large typos list, one could imagine making an error model of it + a simple Levenshtein 1 edit distance thing on top of it.
Needs to be tested for:
If we find that it works well given a typos list of X entries, we could build it automatically if typos file ≥ X.
Main benefit: since we already collect typos, it would be an easy way to build an error model that would correct most typos without us having to do any work.