We need a better definition for artefact.

phochste commented 2 years ago

Currently in Overview the artefact is defined as:

A web resource such as a file or document that serves as the object of exchange between actors and therefore is the smallest divisable unit on the network.

While this text is clear in our colloquial usage of the term in our discussions, it makes the exact understanding of this term in light of lifecycle events and the possibility of complex object open to interpretation. Even in our internal communications, the artefact sometimes means the PDF file, sometimes the PDF + metadata file, sometimes the landing pages (which is assumed to have the semantics ,e.g. Signposting, that makes clear what is the composition of the complex object artefact).

E.g. when archiving an artefact is it clear what this means for a single File / Bitstream and to a lesser extend the Representation of the artefact. But, in an archival context this becomes a bit of a slippery slope when talking about complex objects.

E.g. in PREMIS meaning the artefact under consideration is something else then the smallest divisable unit on the network. They are talking about an Intellectual Entity: that

[sic] is a distinct intellectual or artistic creation that is considered relevant to a designated community in the context of digital preservation: for example, a particular book, map, photograph, database, or hardware or software. An Intellectual Entity can include other Intellectual Entities; for example, a web site can include a web page and a web page can include an image [my emphasis]. An Intellectual Entity may have one or more digital or non-digital Representations.

Ref: https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf

In context of interaction events (e.g. annotation of artefacts) the object of interaction can be a fragment of what we have in mind as indivisable artefact.

E.g. in Web Annotation the target (what would also be like the artefact in our case):

In particular, The Target or Body resource may be a specific segment of the resource. The Target or Body resource may be styled in a specific way. The Target or Body resource may be a specific state of the resource. The Target or Body resource may be included in the Annotation to play a specific role. The Target or Body resource may be any combination of the above. Ref: https://www.w3.org/TR/annotation-model/

The question: is our definition deliberately made vague to accommodate all these use cases (and corallary is it vague enough in this regard), or do we really have a more formal understanding what an artefact is (and what it is not).

It is quite possible that what an artefact is depends on the use case. If you just get a reference, then it the name of the artefact in some pod that you can dereference. Using dereferencing one can learn more about it:

Is it a complex object
Is it a versioned object
Is it a fragment
Is it a particular representation of an object

We are not going to solve the problem of dealing with complex objects, but need a bit clearer what artefact can mean in our specs.

phochste commented 2 years ago

I see that PREMIS is deliberately vague too but explains this vagueness:

Event contains the identifier of the Object involved. What is important is that this association is arbitrary and is not meant to imply that a particular implementation is required. The choice of semantic unit is down to individual implementations.

In some cases a semantic unit takes the form of a container that groups a set of related semantic units. For example, the semantic unit identifier groups the two semantic units identifierType and identifierValue. The grouped subunits are called semantic components of the container. Some containers are defined as extension containers, to allow the use of metadata encoded according to an external schema. This enables PREMIS to be extended with metadata elements that are more granular, non-core, or otherwise out of scope for the Data Dictionary.

mielvds commented 2 years ago

The question: is our definition deliberately made vague to accommodate all these use cases (and corallary is it vague enough in this regard), or do we really have a more formal understanding what an artefact is (and what it is not).

Yes, deliberately vangue, and no, I don't think we need a formal understanding beyond 'it need to be identifiable'. The reasoning is that our network should really be able to consider artefacts as black boxes. If not, a decent level of scalable interop will be hard to achieve. And since it doesn't matter to the network what the artefact is, you can use its components for a complex object, a file, a fragment, ... however...

It is quite possible that what an artefact is depends on the use case.

... the use case should probably be more specific about what the possible artefacts are.

If you just get a reference, then it the name of the artefact in some pod that you can dereference. Using dereferencing one can learn more about it:

* Is it a complex object

* Is it a versioned object

* Is it a fragment

* Is it a particular representation of an object

Yep! But then we are moving beyond the scope of this project, or at least this 'generic base'.

We are not going to solve the problem of dealing with complex objects, but need a bit clearer what artefact can mean in our specs.

That's definitely a good idea. We can give concrete examples for the use cases

MellonScholarlyCommunication / spec-overview

We need a better definition for artefact. #12