danvk / oldnyc

Mapping photos of Old New York
Apache License 2.0
288 stars 130 forks source link

Apply spell correction to OCR #26

Closed danvk closed 9 years ago

danvk commented 9 years ago

There are many trivial errors in the transcribed text which could be fixed using some knowledge of English and NYC-area nouns.

Trivial errors fall into a few classes:

danvk commented 9 years ago

Anecdotally, pyenchant doesn't do a particularly good job of getting the correct replacement in the top spot. Rolling my own spell checker ala Peter Norvig using validated transcriptions as a source of words might work better.

danvk commented 9 years ago

Realistically, I'm not going to revisit this. Users are doing a fine job all on their own.