OCR-D / spec

Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
https://ocr-d.de/en/spec/
17 stars 5 forks source link

Mets updates #207

Closed cneud closed 2 years ago

cneud commented 2 years ago

This is meant as a replacement for/supersedes 154 and 155 since the changes and discussion there became a bit fragmented and difficult to review.

It integrates all additions from both PRs into a new mets.md with this structure:

  1. Metadata 1.1 Unique ID for the document processed 1.2 Always use URL or relative filenames 1.3 Recording processing information in METS
  2. Images 2.1 If in PAGE then in METS 2.2 Pixel density of images must be explicit and high enough 2.3 No multi-page images 2.4 Images and coordinates
  3. File group mets:fileGrp 3.1 File Group @USE syntax 3.2 File Group @USE="FULLDOWNLOAD_..."
  4. File mets:file 4.1 File ID syntax 4.2 @MIMETYPE syntax
  5. Grouping files by page mets:structMap 5.1 Grouping files by page 5.2 OCR-D mets:structMap
  6. Ranges of pages mets:structLink

I hope I have not missed anything and that this will allow us to soon integrate these changes.