Open kba opened 2 years ago
...but also extend it where necessary, e.g. #159, or for language and script identification, or special region detection (only separator lines or tables or stamps or handwriting ...), or pure reading-order detection, or page classification, or import/export tasks.
Yes, prune the ambiguous parts (e.g. difference between layout/analysis
and layout/segmentation
) and add the missing parts. And probably use either categories or steps. And align with our glossary.
difference between
layout/analysis
and
I always understood that as in logical document layout analysis, not optical page layout analysis.
use them to have an additional means to find processors for certain tasks
So basically all processors would have to be registered centrally during installation, right? (Which is also a system-side prerequisite to the discovery
parts of the Web API.)
Perhaps we could write some ocrd ocrd-tool register
(passing a tool JSON) and ocrd ocrd-tool find
(passing a directory to recursively search for tool JSONs). These could be run by some additional pattern rule in ocrd_all, or during make install
in the individual modules. They could both feed into a local DB, which could be queried via some ResourceManager-like API (ProcessorManager
?) or even another CLI (ocrd processor find|lookup|...
?)...
Processor developers must specify categories and steps in the ocrd-tool.json. It would be useful to rethink this classification to make it easier to use them to have an additional means to find processors for certain tasks, besides https://ocr-d.de/en/workflows.