OCR-D / ocrd-website

24 stars 8 forks source link

docs/ocrd-training: export from OCR-D toolchain #101

Closed bertsky closed 1 year ago

bertsky commented 4 years ago

I am not sure I have a good grasp of what is ultimately intended by docs/ocrd-training.md, but as it stands, I think the page should at least link to (or better describe) the 2 very options we currently have to extract line images and respective metadata from PAGE-XML annotations:

kba commented 4 years ago

That page was supposed to provide a "running start" for @Doreenruirui when she started working on what would become okralact. It is true though that we should provide an actual guide on training and your suggestions are welcome.

bertsky commented 4 years ago

Understood.

Another thing that this page or guide should mention is converters for page segmentation training data. With ocrd-segment-from-masks and ocrd-segment-from-coco we have 2 importers and with the debug images and coco output of ocrd-segment-extract-pages we have 2 exporters for commonly used non-PAGE formats.

bertsky commented 2 years ago

Can perhaps be closed – there's a section on the ocrd_segment converters in https://ocr-d.de/en/workflows#step-19-format-conversion now. (And page2img is independent of OCR-D and most OCR tools: tesstrain will probably include its own PAGE converter and Calamari already does. If you do mention it somewhere, then please don't forget https://github.com/uniwue-zpd/PAGETools, too.)

kba commented 1 year ago

I think these are now adressed and the originally referenced page removed, so closing.