ASVLeipzig / cor-asv-ann

OCR-D post-correction with encoder-attention-decoder LSTMs
Apache License 2.0
13 stars 3 forks source link

Add multi-OCR mode #1

Open wrznr opened 4 years ago

wrznr commented 4 years ago

As things stand, multiple hypotheses for (lines of) characters will only be available through multiple runs of OCR in OCR-D. What is missing is a clever way to serialize a single text version out of them. cor-asv-ann has the means available to provide this disambiguation step.

bertsky commented 4 years ago

This is definitely an objective. I have long-standing plans to integrate the ideas of http://www.ccs.neu.edu/home/dongrui/ocr.html. But this may need some time to harmonize, as the basic NMT implementations are quite different, and this would add a completely complementary training/data regime.