parametrize error correction

catalpa-cl / escrito

Apache License 2.0

5 stars 5 forks source link

parametrize error correction #44

Open andreahorbach opened 6 years ago

andreahorbach commented 6 years ago

so that one can choose how the best replacement candidate for a misspelling is selected

strictly based on Levenshtein distance
prefer domain material
etc.

zesch commented 6 years ago

Do we really do that in Escrito itself? We might first need a spelling tool that can be parametrized in those ways (stand-alone project) which is then used here.

andreahorbach commented 6 years ago

We have normalization code that uses the escrito readers corrects spelling mistakes and writes the output, which is the used as new input in a core escrite process. So normalization is not a direct part of the escrito pipeline, but escrito provides ways for spellchecking the data.

zesch commented 6 years ago

ok, my suggestion would then be to allow Escrito to use a spell checker and keep most of the parametrization with the spell checker. Escrito only needs to decide where the spell checker is applied.