OCR-D / ocrd_fileformat

OCR-D wrapper for ocr-fileformat
Apache License 2.0
4 stars 3 forks source link

Missing mets:fptr for generated ALTO files #12

Closed stweil closed 4 years ago

stweil commented 4 years ago

For dfg-viewer and other viewers, the METS file must contain a FULLTEXT mets:fileGrp. This can be generated using the conversion "page alto". In the following example the file ´LOCTYPE` was replaced by a URL:

<mets:fileGrp USE="FULLTEXT">
  <mets:file MIMETYPE="application/alto+xml" ID="IMG_FULLTEXT_459867">
    <mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" LOCTYPE="URL" xlink:href="https://digi.bib.uni-mannheim.de/fileadmin/vl/ubmaweick/451435/FULLTEXT/IMG_FULLTEXT_459867.xml"/>
  </mets:file>
  [...]

The dfg-viewer expects that all generated ID entries also occur in mets:fptr tags, but those are missing. They should look like this:

        <mets:structMap TYPE="PHYSICAL">
          <mets:div TYPE="physSequence" ID="physroot">
            <mets:div TYPE="page" LABEL="[Seite]" ID="phys459867" ORDER="1">
              <mets:fptr FILEID="IMG_FULLTEXT_459867"/>
              <mets:fptr FILEID="IMG_DEFAULT_459867"/>
              <mets:fptr FILEID="IMG_THUMBS_459867"/>
              <mets:fptr FILEID="IMG_MIN_459867"/>
              <mets:fptr FILEID="IMG_MAX_459867"/>
            </mets:div>
           [...]

This looks like a general problem because other OCR-D processors also create new files without adding them to physical or logical pages.

stweil commented 4 years ago

This looks like a general problem ...

@kba, should this issue be moved to core?

stweil commented 4 years ago

Was there a related change? It looks like the latest version works. I'll run one more test and close this if I can confirm that.

stweil commented 4 years ago

I close this issue because it seems to be fixed by some newer version of core.