danvk / oldnyc

Mapping photos of Old New York
Apache License 2.0
288 stars 130 forks source link

Run OCR on higher-resolution imagery #27

Closed danvk closed 9 years ago

danvk commented 9 years ago

Most of the backing text was transcribed from 1349x2048 images. In these, the individual lines have an x-height of ~12–13px.

There is 3x higher-resolution imagery available, at ~3958x6144. This gives an x-height of 35–40px. If a model were trained on this data instead of the lower-resolution version, then presumably it would be better.

danvk commented 9 years ago

I'm unlikely to get this data or re-run OCR using it.