Open shsdev opened 2 months ago
Hello,
I am not quite understanding the use cases that are meant to be supported. In my view, AIPs live independently of SIPs, but might be affected by them, such as they might be affected by other operations such as metadata enrichment, file format convertion or even redaction or destruction by retention processes. Note also that AIPs might be created by SIPs, or by operations, such when creating an AIC (and AIP relative to a collection or a case with only metadata).
Also, I am not very confortable with this level of complexity, the versioning should be an acessory level, that you could take it or leave it without much change to the overall format of the AIP.
As such, I would like to suggest the following:
Given this, I would like to suggest the following layout of an AIP with two versions, were the first version was created by a SIP and the second version is just an update of the descriptive metadata.
urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a
├── metadata
│ ├── descriptive
│ │ └── ead.xml
│ ├── metadata.json
│ ├── other
│ │ ├── processing.log
│ │ └── state.json
│ └── preservation
│ ├── premis_202401094-230854Z_event_sipcreation.xml
│ └── premis_20240409-230854Z_event_ingest.xml
├── METS.xml
├── representations
│ └── 09502a26-f822-407c-ad0a-4d7e64052a91
│ ├── data
│ │ └── example.pdf
│ ├── metadata
│ │ └── preservation
│ │ └── premis.xml
│ └── METS.xml
└── schemas
│ ├── csip.xsd
│ ├── ead3.xsd
│ ├── IP.xsd
│ ├── mets_1_11.xsd
│ ├── premis-v2-2.xsd
│ └── xlink.xsd
└── versions
│ ├── 0=ocfl_object_1.0
│ ├── inventory.json
│ ├── inventory.json.sha512
│ ├── v00000
│ │ └── metadata/descriptive/ead.xml
│ │ (all other files in AIP originaly received from the SIP)
│ └── v00001
│ └── metadata/descriptive/ead.xml
| (descriptive metadata updated on the digital preservation archive)
└── submissions
└──2024-04-10T11-57-00Z
└── example.sip.001.zip
I don't have strong feelings, and I don't have skin in the game. I'm not quite able to make like for like comparison because one or two points in @luis100 response aren't clear to me. I think you're suggesting no BAGIT and no use of TAR for OCFL versions. I'm inclined to agree about BAGIT; I'm not sure we gain much from its use and much of the metadata is redundant. I agree that using TAR archives in versions is perhaps a bit messy and obscures the content/metadata changed.
Versions and Submissions are optional, their use will have a significant impact in storage and processing and their use should be defined by implementation and easy to switch on and off.
Storage and processing impact IS and implementation detail to some degree. Institutional policy/budget/choice will also be a factor. Making them optional appears a sensible decision.
Note that the OCFL format does not belong to the AIP. This is just one possible way how to store the original SIP (in this example as v00000) if you want to keep it, and the versioned AIPs are separate instances of AIPs. We moved away from integrating the versions into the AIP since E-ARK3.
Packaging as TAR/ZIP/etc. is a technical implementation detail that depends on storage system and requirements. It is adequate if the packages need to be transferred and may be a good approach if you have a tape system where the AIPs are stored for the long-term. However, if the AIPs are still being updated, the continuous re-packaging causes a lot of processing and redundancy.
The question here was about the use of BagIt to wrap E-ARK AIPs. In E-ARK, the manifest is included in the METS, but bagit has a simpler, non-XML format (payload manifest) for this purpose.
The AIP working group discussed the use of BagIt and recommends to take it out of the main recommendation for AIP packaging. Instead, it would be moved to an appendix where it will be explained how to wrap E-ARK information packages using BagIt. As optional BagIt packaging is also relevant for the SIP, it should be added to the CSIP rather than to the AIP. This decision is independent from the use of OCFL which will be dealt with in a separate issue.
The suggestion is:
Board members acknowledgment of the issue: Tick the box in front of you name to indicate that you have looked at the suggestion.
Voting (Decision making will be carried out on the basis of majority voting by all eligible members of the Board. In the case of a tied vote, decisions will be made at the discretion of the Chair)
Tick the box in front of you name to say yes to the suggestion.
For the versioning of AIPs the plan is to recommend the use of OCFL.
Assuming the following structure for an original submission information package
example.sip.001.tar
stored as versionv00000
and an AIPurn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar
stored as versionv00001
:The
inventory.json
could look as follows:Note that there is an overlap of fixity information which is provided in the METS already.
The question for voting is if the container files
example.sip.001.tar
for the original SIP andurn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar
for the AIP should be wrapped in a bagit container, for example:Note that this way fixity information would possibly be provided in up to four layers:
To reduce complexity and redundancy, the proposal is store the E-ARK information package as TAR files instead of wrapping them as bagit containers as shown in the example above.
The E-ARK AIP container file
urn+uuid+81bd3aa2-7350-44f6-ad54-d8181858605a.tar
would then have the following form, for example:The suggestion is:
As part of the general AIP recommendations, the proposal is to store the E-ARK information package as TAR files instead of wrapping them as bagit containers.