archesproject / ARM_Working_Group

Arches Resource Model Working Group. A repo for community reviewed Arches branches, models, and packages
2 stars 2 forks source link

Class analysis: E31 vs E73 #10

Closed azaroth42 closed 4 years ago

azaroth42 commented 6 years ago

What benefits are there for E31 Document over E73 Information Object?

So, my estimation is that the distinction of "propositions about reality" as opposed to propositions about anything is not very useful. Maybe all of the propositions in a work are about reality ... maybe they're not. We likely have absolutely no way of knowing, a priori, from a database entry. There's no system that I know of that would change its behavior in seeing an E31 vs an E73. Meaning that for practical purposes, the distinction is useless.

E73, on the other hand, has the important feature of being both symbolic and propositional. We can prune further back up the tree than that.

Habennin commented 6 years ago

E31 Document comes from museum practice and is meant to capture those information objects that have the intentional character to document something: pictures of objects opened and closed, the conservation report of Sally Watkins, a picture of the exhibit setup and so on. It is meant to pick out that subclass of information objects which were formulated in their original intent to give me some objective information about x.

At the Qatar Museums Authority we set up a documentation programme to capture images of the collections. A photography team was put together and they set about creating images for the objects. Some of those images were created according to a particular methodology with the intent of using the same system of recording each time according to different types of objects in order to have a record (or E31 Document in the CRM parlance) of the object. Other pictures were taken against beautiful velvety backgrounds to show the glory of the particular object and colour. The former were documentary in nature, the latter not. In the information system it was important to the curators and conservators to be able to access separately and independently the analytic photographs (E31 Documents) from the other information objects (which were all lumped into the same part of the database). So of course these photos had a tag to pick them out and then they were used for different purposes. That, I believe, is the use case of E31 Document.

My curators and conservators would have been very angry at me had I not provided them a way to sort documentary style information objects that pointed at particular objects using particular means from other related miscellaneous multimedia dross. Some objects simply refer others document.

azaroth42 commented 6 years ago

Issues with using E31 at all:

workergnome commented 6 years ago

I recognize the use case as a real one, but to me it feels like privileging a specific type of document and a specific relationship between referrer and referent.

Information Objects can have many types: Document, yes, but also Analysis, or Parody, or Dramatization, or Fiction, or Opinion, or Abstraction, or Argumentation, to name a couple of types. Also, the the relationship between them could be documentary, or analytical, or parodical, or dramatized, or fictionalized, or subjective, or abstracted.

There are multiple established patterns for dealing with this: the most common are sub-typing, sub-propertying, or P2 typing with a vocabulary. My preference, given the innumerable types and refinements of types that are possible when defining human intent, is to use controlled vocabularies for typing: Anything else will result in a request to boil the ocean, since there are valid use cases for all of the above types, and I can't think of a good reason why Document is appropriate for CIDOC, but Argumentation or Opinion is not.

Conal-Tuohy commented 6 years ago

E31 is a subclass of E73, and thus a sibling of E33 Linguistic Object. Thus Documents are not Linguistic Objects and cannot have language associated with them. This is a show-stopper for us, and should be for any organization.

This would be true if the classes were disjoint, but they are not, so it is false.

E31 is also a sibling of E36 Visual Item. Meaning images cannot be used to document artwork.

ditto

Conal-Tuohy commented 6 years ago

@Habennin is correct to note that E31 Document is a museological term of art. Documentation is a central concept, and hence this distinction is a crucial one in the mind of practitioners in the field, and that's why it has been so privileged as to warrant its own property in the CRM (unlike the potentially infinite literary distinctions which literary scholars might want to make between different modes of reference in other literary genres, for which, yes, a taxonomy of E55 Types would be appropriate). Are there actual requirements for those other "genre" distinctions? Can we see some examples? I'm not saying there aren't any; it's just that It's easy to imagine possible use cases for anything, but often YAGNI.

Note also that E31 Document has a more specific subclass to represent authority files; again, this is because information objects of that specific type have an important and distinctive role in museum documentation (as they do in libraries, archives, etc).

I think it would be really helpful to discuss this in the context of some concrete and real examples of Information Object for which this distinction is problematic.

azaroth42 commented 6 years ago

E31 is a subclass of E73, and thus a sibling of E33 Linguistic Object. Thus Documents are not Linguistic Objects and cannot have language associated with them. E31 is also a sibling of E36 Visual Item. Meaning images cannot be used to document artwork.

This would be true if the classes were disjoint, but they are not, so it is false.

Except that in Arches an instance has exactly one class. So in the RDF framework it is false, but in Arches it is contextually true given that additional constraint.

And indeed in all the libraries I found that aren't pure graph implementations it was also true, and hence is true in linked.art as well.

Conal-Tuohy commented 6 years ago

Isn't the solution then (as elsewhere) simply to define new subclasses to unite the necessary CRM superclasses? Why fret about where to "prune" when you can just "graft"?

Conal-Tuohy commented 6 years ago

And indeed in all the libraries I found that aren't pure graph implementations it was also true, and hence is true in linked.art as well.

I'm not involved with Arches at all, so I won't comment further here, except to say that I do think it'd be a mistake to allow this bug or limitation in certain OO libraries to dictate a "pruning" from the data model of useful classes. This issue here has brought home to me the full scope of the problem that this bug induces (i.e. it's not just about where the CRM intersects with other ontologies, but also where core classes of the CRM intersect with each other). I now appreciate much more the force of @workergnome's comment on the other thread:

This is a pretty significant limitation that we're imposing on ourselves. We're disallowing a core feature of RDF (multi-typing) due to a desire to have it match with the limitations of a different domain.

For those of us who are implementing linked art's model based on a platform which doesn't exhibit this limitation, any such pruning is a loss for absolutely zero gain.

It seems to me that a solution which doesn't impose this extraneous requirement on the model must either:

workergnome commented 6 years ago

From a quick check of the projects I'm working on, I am currently using Linguistic Objects to describe bibliographies, provenance, materials statements, dimension statements, footnotes, citations, captions, rights statements, source statements, descriptions, auction catalogs, abstracts, and inventory takings. We're also working with auction catalogs, inventories, transcriptions, archival materials, books, born-digital works, wall labels, scholarly notes, scientific data, conservation reports, staff notes, manuscripts, translations, transcriptions, and other forms of linguistic objects.

Some of these are historical references, some of these are scholarship, some are known to be false, some are authority documents, some are descriptive, some are argumentative, and some are documentary in nature.

Most or all of these are museological concepts, and I believe that they're just as relevant as documentation to the work of museums and cultural heritage at large. I would also argue that our colleagues in the archive and libraries are equally part of cultural heritage, and much of the scholarship that they support is literary in nature...their concerns and needs are just as much a part of the work that we're doing as the explicitly museum-based work.

Conal-Tuohy commented 6 years ago

OK, so the distinctive feature of an E31 Document with respect to the more generic E73 Information Object is that their mode of reference to another entity is that they document it. So that would obviously include bibliographies (and other "ographies"), materials statements, descriptions etc. etc, but not (in general) "archival materials", "books" etc. (except of course where those archival materials, books, or whatever are in fact known to be documentation for something).

The criterion for whether to use the E31 Document class to more narrowly classify an E73 Information Object is simply whether you can validly make use of the property P70 documents to say that the E73 Information Object documents some specific thing.

So if you don't know whether the E73 Information Object is specifically documentation for some specific thing, then you can't use that P70 documents predicate, and hence the E73 Information Object needn't be regarded as an E31 Document. If you don't actually know the particular subject(s) of the E73 Information Object, then you obviously don't need to regard it as an E31 Document. If you know that it refers to some other entity, but you don't know if it isn't merely some passing reference, then you don't need to regard it as an E31 Document.

Whereas on the contrary, if you have an E73 Information Object, and you do know that it documents some other entity or entities (in the technical sense of "documents" that we've been discussing), then you can use that more specific predicate P70 documents, and you can therefore classify the E73 Information Object as an E31 Document. The advantage of doing so is that client software which is aware of that distinction can, for instance, query for documentation about some particular entity of interest, and retrieve authoritative and "documentary" information about that entity.

The same (mutatis mutandis) applies to the even narrower class E32 Authority Document and to other sub-classes of E73 Information Object such as E29 Design or Procedure, 'E33 Linguistic Object', and E36 Visual Item, each of which has its own specific mode or modes of reference to other entities (listing them, making use of them, translating them, visually depicting them). So I don't think it's fair to regard the E31 Document as being particularly privileged in having its own mode of reference (P70 documents).

To come back to your point about using an external vocabulary to model different modes of references, it's worth nothing that, apart from those specific classes and their respective properties, the superclass E89 Propositional Object features a property for "referring to" another entity in a more general way (P67 refers to, which is a super-property of P70 Documents, P138 Represents, etc), and you are supposed to be able to further classify these references by linking them to E55 Types in external vocabularies, exactly as you suggest. This does involve the use of the CRM's notorious "properties of properties", but as an aside, there's been some interesting and I think very positive developments on that front, on the CRM mail-list.

The way I see it is: we have a generic "reference" property which can be sub-typed with a thesaurus (which is handy for dealing with an open vocabulary of more specific reference types such as e.g. literary scholars might want to use: "parody", "homage", "sampling" ... whatever), and we also have a bunch of specific RDFS subproperties of that general reference property, each corresponding to some well known concepts in cultural heritage information management, and which (because they pre-defined as RDFS subproperties rather than just typed using an external vocabulary) have their own domains and ranges which provide some type-safety.

annabelleee commented 4 years ago

For ARM WG, default to E73, if not clear exceptional case where E31 (and P70) are needed.