When downloading a METS file (== cloning a workspace) and downloading files, the @xlink:href of the mets:file/mets:FLocat of downloaded files is changed from their HTTP URL form to a local filename path relative to the workspace. This makes it easy to process data but very difficult to map those local filenames back to URL for ingestion into production systems.
How it should be
When cloning workspaces and downloading files, the existing mets:FLocat should not be changed but rather a new mets:FLocat with the local filename should be added.
We also need a processor to remove the mets:FLocat for the local filename needs to be removed because the ZVDD METS profile does not allow multiple mets:FLocat.
Steps
[ ] Adapt the specifications to lay out exactly how downloading should work with regards to METS
[ ] Adapt OCR-D/core to implement this changed behavior
[ ] Develop the postprocessing processor to remove additional mets:FLocat
[ ] Test, with a broad range of existing real-life METS data, that the mechanism works correctly.
Current situation
When downloading a METS file (== cloning a workspace) and downloading files, the
@xlink:href
of themets:file/mets:FLocat
of downloaded files is changed from their HTTP URL form to a local filename path relative to the workspace. This makes it easy to process data but very difficult to map those local filenames back to URL for ingestion into production systems.How it should be
When cloning workspaces and downloading files, the existing
mets:FLocat
should not be changed but rather a newmets:FLocat
with the local filename should be added.We also need a processor to remove the
mets:FLocat
for the local filename needs to be removed because the ZVDD METS profile does not allow multiplemets:FLocat
.Steps
mets:FLocat