OCR-D / page-to-alto

Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
Apache License 2.0
13 stars 5 forks source link

Properly translate PAGE metadata LastChange, Created, Creator #38

Open kba opened 10 months ago

kba commented 10 months ago
          IMHO the correct representation would have been:

For ALTO v2 with its preProcessingStep|ocrProcessingStep|postProcessingStep distinction, one would probably have to map to:

But obviously, this is not ideal. However, since PAGE's Created/LastChange does not have a clear semantics, I would argue this is the best pragmatic fit.

BTW, we are also still missing Metadata/Creator! IMO this should go into the contentGeneration (or preProcessingStep) entry.

Originally posted by @bertsky in https://github.com/kba/page-to-alto/issues/37#issuecomment-1888867562