allow specifying multiple output file groups for binarize, OCR-D/spec#117 - Githubissues

OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces

MIT License

38 stars 11 forks source link

allow specifying multiple output file groups for binarize, OCR-D/spec#117 #59

Closed kba closed 5 years ago

kba commented 5 years ago

Proof of Concept how multiple output file groups can be supported.

bertsky commented 5 years ago

Sorry about the failing tests, that was a fault of mine from the last PR (simplify-common). Fixing now.

bertsky commented 5 years ago

@kba probably no need to rebase though – we do not even have binarization in the tests yet.

bertsky commented 5 years ago

Also, what do we do with output_file_grp in ocrd-tool.json? There needs to be a place where assumptions on input and output file groups can be put (how many, what order)... same for multi-OCR input or OCR-GT alignment/evaluation

bertsky commented 5 years ago

Looks good already (regardless of the outcome of OCR-D/spec#117). I'd say somebody should systematically make this change in all AlternativeImage aware processors (Tesseract and Ocropy) now – before users and components get used to the current fixed scheme. (But that would not be me, sorry.)

kba commented 5 years ago

I can do that, but can you open an issue and assign me so i won't forget please? Thanks

bertsky commented 5 years ago

I can do that, but can you open an issue and assign me so i won't forget please? Thanks

done – see above. One for Tesseract, one for Ocropy.

wrznr commented 5 years ago

@kba Can we merge?