DILCISBoard / E-ARK-AIP

E-ARK AIP Specification
https://earkaip.dilcis.eu/
Creative Commons Attribution 4.0 International
8 stars 4 forks source link

In a physical AIP, were representations ever considered as being the root? #77

Closed kieranjol closed 1 year ago

kieranjol commented 1 year ago

Hi,

I think the E-ARK specs are really wonderful. One thing I'm trying to get my head around is that the root of the E-ARK AIP is the Intellectual Entity. I see why this makes sense from a logical sense, and even from a physical sense. However, I think that there is a potential valid reason for having a representation as the root. for example:

So I suppose my question is, was it ever considered that a representation could be a root, and were there other reasons other than the ones I've listed that might have accounted for the physical AIP always having an IE at its root? The more i think about it, having the physical AIP reflecting the logical AIP makes a lot of sense, and it maps to the PREMIS object types quite well. My main concern is that some archives may rely on basic tape storage which don't allow for ease of updating packages, so E-ARK might be out of reach as a result.

Best,

Kieran O'Leary National Library of Ireland

jmaferreira commented 1 year ago

This is somewhat related to the SIP/AIP UPDATE procedure described at #76

kieranjol commented 1 year ago

I agree! Thanks for linking that, I hadn't seen it. The more I looked at fig 4 here: https://earkaip.dilcis.eu/#fig4 fig_4_mets_root

It sort of reflects my use case, but I see here that the descriptive metadata is most likely duplicated across both representations, and would need to be mutually updated if required.

My question still holds about if representations were ever seen as being a potential root for an E-ARK AIP, as I think it could lower the barrier for entry with adopting the spec. I will say that I increasingly think that having the IE as the root still makes the most sense, assuming one has the infrastructure to support this.

jmaferreira commented 1 year ago

Hi,

Your statement is only partially true. The descriptive metadata is expected to be placed at the root level (under the metadata/descriptive) folder. This way, it doesn't matter if you have 1 or 100 representations. Their intellectual description will always be the same.

The use of descriptive metadata at the representation level is supposed to be exceptional. It is there to be able to handle exceptions such as a particular representation including some additional descriptive elements that may aid in the finding of that object.

kieranjol commented 1 year ago

This makes much more sense now, thank you! I'm happy to have this closed anyhow.

kieranjol commented 1 year ago

Actually, is your clarifying statement explicitly in the spec, or is it implicit? I can't find it here anyhow but I may be looking in the wrong place: https://earkaip.dilcis.eu/#compoundvs.dividedpackagestructure

Is it worth adding a clarifying statement to the spec? Edit: I couldn't find a reference in the Common Spec either.

jmaferreira commented 1 year ago

@shsdev @karinbredenberg Could you please check if the spec makes it clear to everyone how descriptive metadata at the representation level is expected to be used?

shsdev commented 1 year ago

Dear Kieran,

indeed it is not specified what type of metadata you put on the root or representation level. The possibility is there to allow storing metadata specific to the representation. We haven't specified what that means either.

However, generally, the descriptive metadata in the root relates to the intellectual entity. I think it is wise to keep this metadata always up to date independent of other descriptive metadata at the representation level. The reason is that the descriptive metadata in the root is always the first thing to look into.

How metadata is maintained in the root depends on how the AIP is managed as a package, i.e., if it is possible to update the root metadata. There is the possibility to do a backup of the descriptive metadata each time it is edited or do any other kind of versioning. Anyhow, as said, it should always be kept up to date.

The structure of the AIP is meant to receive updates in form of new representations. However, the root METS has a pointer to representation METS, so this one should be updated anyhow. Further, the PREMIS events should document why the new representation was created (you said that you may receive a new representation, but this could also be a decision for preservation purposes). The PREMIS event metadata should also document which representation was used to derive the new representation. So there is a few structural and preservation metadata anyhow which need to be changed if a new representation is added.

Coming back to the question what type of metadata should be stored at the representation level: I think it should not be descriptive metadata which relate to the intellectual entity as a whole because this metadata would intuitively be expected at the root level. The folder is rather for technical and preservation metadata which relate to the representation.

Best wishes,

Sven

kieranjol commented 1 year ago

Apologies for the delay, this is super clear. I still think that more clarification is required in the specification regarding figure 4. Perhaps removing 'descriptive' from the metadata hierarchy in representations could be warranted, or a clarifying remark about how the key descriptive metadata needs to be at the root IE level, and any additional descriptive metadata for the representation should only go in the representation descriptive metadata folder. I'm trying to think of examples where extra descriptive metadata would be required at the representation level, as you've covered how so much contextual metadata around the creation/acquisition of the new representation could be covered by technical metadata or PREMIS events. If a representation requires extra descriptive metadata, is it even the same work at that point, and to use PREMIS terms, would it be deviating from the 1:1 principle where this representation is actually part of a different Intellectual Entity?

shsdev commented 1 year ago

You are right, I think it would be good to clarify this in the specification and also maybe remove the descriptive metadata at least from the example figure. I am not sure descriptive metadata would be needed at the representation level at all. And yes, I think it is important to be careful about changes which actually lead to a different intellectual entity. Representations are only for changes regarding the form of how the data is persisted and presented, such as storing a file in a different format. Changing metadata does not change the content files, so it would not be an issue in that regard. However, if the metadata is used for interpretation in a scientific context, this might lead to confusion and misunderstandings.

shsdev commented 1 year ago

Issue is addressed in pull request https://github.com/DILCISBoard/E-ARK-AIP/pulls, changes will go into AIP specification release v2.1.1.