UB-Mannheim / kitodo-presentation-docker

Docker configuration for Kitodo.Presentation
GNU General Public License v3.0
4 stars 3 forks source link

🆕 [presentation] Store OCR results with METS using unique path #24

Closed csidirop closed 1 year ago

csidirop commented 1 year ago

The OCR results must be stored in a path which is unique for a certain digitized work.

Maybe use URN? And possibly in addition a hash code derived from the METS URL?

METS: https://digi.bib.uni-mannheim.de/fileadmin/vl/ubmaosi/59088/59088.xml <mods:identifier type="urn">urn:nbn:de:bsz:180-digosi-30</mods:identifier>

Directory urn/nbn/de/bsz/180/digosi/30/SHA1 (with SHA1 = SHA1 of METS URL) METS file ALTO files

csidirop commented 1 year ago

OCR results are now stored in a path consisting of the URN: fileadmin\fulltextFolder\URN\nbn\de\bsz\180\digosi\30 One downsite is that we have reparse the METS XML again because presentation is not storing the URN unified and sometimes not at all! An additional hash of the METS URL can be added if wanted.

For METS without URN the path consist of the METS URL hash: fileadmin\fulltextFolder\noURN\e8e043131b1ebc2559d6eff98204bea937726c80

So the final folder structure looks like this: grafik

csidirop commented 1 year ago

Tested on four institutions and five documents in total. Maybe more institutions should be tested. Tested Documents: