OCR-D / gt-guidelines

OCR-D guidelines for Ground Truth production
https://ocr-d.de/en/gt-guidelines/trans/
Creative Commons Attribution Share Alike 4.0 International
6 stars 5 forks source link

structural concordance: collect more (possible) pairs #40

Open bertsky opened 2 years ago

bertsky commented 2 years ago

In en/trans/structurmets2page.dita, we could add the Page/@type types with their DFG Strukturdatenset counterparts (some of which are already covered in en/trans/structur_gtpageformat.dita, perhaps because they were also in the Zot format already):

Also, why is mets:div/@type=table likened to pc:TextRegion/@type=heading and not @type=caption (or pc:TableRegion directly)? (Same probably goes for mets:div/@type=map vs. caption / pc:MapRegion, as well as mets:div/@type=musical_notation vs. caption or pc:MusicRegion.)

Also, where is illustration?

Next, I would have expected that pc:GraphicRegion/@type gets mapped, too:

Furthermore, IIUC it seems plausible to also suggest mapping some of the mets:div types to pc:ReadingOrder types:

Generally, it would also help to strictly differentiate between structural types (what ENMAP calls contentUnit) and layout types (what ENMAP calls contentItem).

Lastly, how about also collecting concordance between mets:div types and alto:LayoutTags and alto:StructureTags? I can see many similar entities in the official documentation. Perhaps a full discussion of this would also need to include the various ENMAP profiles...