ICA-EGAD / RiC-O

ICA Records in Contexts-Ontology (ICA RiC-O) GitHub repository web pages
https://ica-egad.github.io/RiC-O/
51 stars 16 forks source link

Is there a need for a subclass of RecordResource (or similar) to refer to metadata? #56

Open williamsonrichard opened 1 year ago

williamsonrichard commented 1 year ago

Something that we have a need for is to be able to keep track of metadata history. For example, we may wish to take a piece of metadata from one internal system that has been provided in free-text form, and (manually or automatically) replace it, when creating our ontology individuals, by something strructured. The original free-text may for instance come ultimately from a form filled in by the creator of the archive, and if we make changes, it is important that we can trace back to the original.

As far as I understand from the very helpful examples in part 6, 'Documenting description', of RiC-CM, the intention is that metadata itself is to be understood as a record set (or some other sub-class of Record Resource). I like this idea, but it is not entirely straightforward to implement this in practise; or at least, there is not only one way to do it.

Let us suppose for example that I wish to document a change in the agent assigned to some record set by means of rico:hasCreator, from 'Agent A' to 'Agent B', say. One way to proceed might be to use rico:Activity; maybe we might do something like the following.

ObjectProperty: changedFrom SubPropertyOf: rico:affectsOrAffected

ObjectProperty: changedTo SubPropertyOf: rico:affectsOrAffected

Class: ChangeInMetadata SubClassOf: rico:Activity, rico:changedFrom some rico:Thing, rico:changedTo some rico:Thing

But rico:Thing is very general, and can refer to things outside of RiC-O proper. I would like to be able to refer to the latter, and this is I think closely related to the ideas in 'Documenting description' that I referred to: we would like to say that any metadata about a record resource which we express in RiC is itself a record resource. Therefore I would I think like something like

Class: Metadata SubClassOf: rico:RecordResource, rico:ObjectProperty some rico:Class

but we lack rico:Class and rico:ObjectProperty, that is to say, a class to which every class in RiC-O belongs, and similarly for object properties, etc.

What do people think? Is there some way to achieve this in RiC-O as it stands that I am overlooking?

As a small side-remark, it would be great if 'History' could be given a class/object property, and not only be a datatype property.

florenceclavaud commented 1 year ago

Hi @williamsonrichard, my first cents:

florenceclavaud commented 1 year ago

A few more words about this: "As a small side-remark, it would be great if 'History' could be given a class/object property, and not only be a datatype property."

You can use rico:Event in fact and replace (when possible) the history datatype property by a series of Events, using rico:isOrwasAffectedBy object property, to connect any Thing (thus of course a Record Resource or an Agent or Place) and the instances of Event.

williamsonrichard commented 1 year ago

Hi @florenceclavaud, thank you very much for your thoughts! Lots to think about! Just to reply to the simpler matter first:

A few more words about this: "As a small side-remark, it would be great if 'History' could be given a class/object property, and not only be a datatype property."

You can use rico:Event in fact and replace (when possible) the history datatype property by a series of Events, using rico:isOrwasAffectedBy object property, to connect any Thing (thus of course a Record Resource or an Agent or Place) and the instances of Event.

I completely agree that it is good to use rico:Event in this way, but rico:affectsOrAffected is very general, and I think there is definitely room in RiC-O for something which corresponds a little more closely to History in RiC-CM (if not, I think one could probably argue that there is not much point in having History as a separate attribute in RiC-CM at all). It might be enough to just introduce a sub-property of rico:affectsOrAffected, e.g. something like the following (probably with a better name)?

ObjectProperty: isEventInHistoryOf
SubPropertyOf: rico:affectsOrAffected

Possibly one might wish to make the domain not just rico:Event but rico:Event, rico:isAssociatedWithDate some rico:Date if one wishes to formally say that a historical event entails a date.

williamsonrichard commented 1 year ago

To come back to the main aspect of this issue, your suggestions are very useful @florenceclavaud, but they do not I think quite address the core matter. I may well not have explained it clearly, I will try to do so in a different way here. To put it very succinctly, it has to do with 'reification': I would like to regard any triple I express in RiC-O with subject a certain record set R to be able to be 'reified' into a record resource (or some more specific sub-class of this) with a rico:describesOrDescribed relation to R. I can then use rico:Event to describe changes in that reification.

I will elaborate here. To be precise, let me say that a RiC-triple is an RDF triple with subject an individual whose class is defined in RiC-O, with object the same or an rdfs:Literal, and with property an object property or datatype property defined in RiC-O.

As I described in my original comment, the problem I am trying to address is how to describe changes in such RiC-triples: corrections, additions, etc. If I am to be able to do so, I must reify those triples somehow. There are various possibilities as to how to do so, which don't really have anything specifically to do with RiC: one possibility for instance, which I will not follow here, is to use rdfs:Statement. To some extent indeed, RiC-O already has some classes which are essentially reifications of object properties, such as rico:CreationRelation as a reification of rico:isCreatorOf. But more or less any RiC-triple can be corrected/added to/etc, so we'd like something quite general.

As a first step, let us suppose that we have introduced a class rico:Class of which all classes in RiC-O except rico:Thing are sub-classes, an object property rico:ObjectProperty of which all object properties in RiC-O are sub-properties, and a datatype property rico:DatatypeProperty similarly.

Then I think we can express reification in something like the following way, including the idea that a RiC-Triple itself is a record. I omit a few definitions where these are obvious.

Class: RiC-Triple
  SubClassOf: rico:Record

ObjectProperty: reifiesRiC-TripleFor
  Domain: RiC-Triple
  Range: rico:ObjectProperty some rico:Class

ObjectProperty: hasSubject
  Domain: RiC-Triple
  Range: rico:Class 
  InverseOf: isSubjectOfRiC-Triple
  SubPropertyChain: isObjectOfRiC-Triple o reifiesRiC-TripleFor,
  SubPropertyChain: isSubjectOfRiC-Triple o rico:describesOrDescribed

ObjectProperty: hasObject
  Domain: RiC-Triple
  Range: rico:Class 
  InverseOf: isObjectOfRiC-Triple

ObjectProperty: reifiesHasCreatorTripleFor
  SubPropertyOf: reifiesRiC-TripleFor
  Range: rico:hasCreator some rico:Agent

Individual: A
  Types: Agent

Individual: R
  Types: RecordSet
  Facts:
    hasCreator: A

Individual: R-hasCreator-A-Triple
  SubClassOf: RiC-Triple
  Facts:
    hasSubject: R,
    hasObject: A,
    reifiesHasCreatorTripleFor: R

This is rather abstract, and I am not sure yet whether I am happy with it (the use of property chains also takes it out of OWL DL, which I would prefer to avoid if possible, though this is already the case for RiC-O), but if I'm not missing something it does do what I'm looking for. If one wished to change the creator of R from A to B, one could now use a rico:Event which points to R-hasCreator-A-Triple and a similar individual R-hasCreator-B-Triple to express the change.

What I referred to as a class 'Metadata' in my original post is here 'RiC-Triple'.

There may well be better ways of doing this, but I hope that this makes what I am asking about clearer: how to reify triples to records in RiC, as seems to be suggested as the right way to think according to chapter 6 of RiC-CM (and which I too like as an idea)?

williamsonrichard commented 1 year ago

The following is an alternative possible formulation; it is simpler and more elegant than the one in my previous comment, and also remains within OWL DL, yet in fact I am not sure that one really loses anything significant compared to the one in my previous comment.

Class: RiC-Triple
  SubClassOf: rico:Record

ObjectProperty: reifiesHasCreatorTripleFor
  Domain: RiC-Triple
  Range: hasCreator some rico:Agent and rico:isOrWasDescribedBy some RiC-Triple

Individual: A
  Types: rico:Agent

Individual: R
  Types: rico:RecordSet
  Facts:
    rico:hasCreator A

Individual: hasCreatorTriple
  Types: RiC-Triple
  Facts:
    reifiesHasCreatorTripleFor R,
    rico:describesOrDescribed R
florenceclavaud commented 1 year ago

Hi @florenceclavaud, thank you very much for your thoughts! Lots to think about! Just to reply to the simpler matter first:

A few more words about this: "As a small side-remark, it would be great if 'History' could be given a class/object property, and not only be a datatype property." You can use rico:Event in fact and replace (when possible) the history datatype property by a series of Events, using rico:isOrwasAffectedBy object property, to connect any Thing (thus of course a Record Resource or an Agent or Place) and the instances of Event.

I completely agree that it is good to use rico:Event in this way, but rico:affectsOrAffected is very general, and I think there is definitely room in RiC-O for something which corresponds a little more closely to History in RiC-CM (if not, I think one could probably argue that there is not much point in having History as a separate attribute in RiC-CM at all). It might be enough to just introduce a sub-property of rico:affectsOrAffected, e.g. something like the following (probably with a better name)?

ObjectProperty: isEventInHistoryOf
SubPropertyOf: rico:affectsOrAffected

Possibly one might wish to make the domain not just rico:Event but rico:Event, rico:isAssociatedWithDate some rico:Date if one wishes to formally say that a historical event entails a date.

Hi @williamsonrichard, creating such a subproperty is a good idea IMHO at first glance. Also, conceptually speaking, it would allow to distinguish past events from current activities.

However, I think that we cannot get rid of the rico:history datatype property. We have a huge quantity of archival metadata that contain textual historical discourse about the entities it concerns, e.g. biographies or history of agents, custodial history of records, history of their management (appraisal, arrangement etc.). We need to enable implementers to move to RiC, thus convert this metadata to RiC-O, without losing such textual information, that may often be of significant importance for users (often they will not find such discourse elsewhere; this is true for record resources, and also for our authority records on agents, which are far less numerous and less indexed than the authority records managed by librarians, but often have much more textual content). In such projects, the staff involved may have no means to structure this textual discourse into a series of events with dates, descriptions, agents and places. At least no means for a while. This may involve NLP and AI in fact. Le me also say that for now, I am not fully convinced that such long, complex, nuanced, historical discourse on an entity (in fact several entities related to each other) can for now be fully represented by a series of assertions, even reified (in RDF) or documented by any other method ;-). To be more precise, I would be happy to work on such a project, and then see what a well trained AI (thus from a good training corpus) and appropriate (thus huge and very rich) models and knowledge bases could do. Supposing that this exists or can be developed soon (which would be great!) and that the team involved has the time, human resources and budget to apply such methods systematically, I would be happy to keep the old discourse - just as evidence of the previous form of the history. Sor for now, I would say that we need both methods (rico:history and rico:Event). Not to mention that they can be used in combination, for example if you decide that you will rather try, a a first step, to annotate, classify and identify the named entities (agents, places, etc.) mentioned in such a discourse. And why not, in the other direction, thinking of AI generating a historical discourse from a series of events. Sorry to be chatty!

florenceclavaud commented 1 year ago

The following is an alternative possible formulation; it is simpler and more elegant than the one in my previous comment, and also remains within OWL DL, yet in fact I am not sure that one really loses anything significant compared to the one in my previous comment.

Class: RiC-Triple
  SubClassOf: rico:Record

ObjectProperty: reifiesHasCreatorTripleFor
  Domain: RiC-Triple
  Range: hasCreator some rico:Agent and rico:isOrWasDescribedBy some RiC-Triple

Individual: A
  Types: rico:Agent

Individual: R
  Types: rico:RecordSet
  Facts:
    rico:hasCreator A

Individual: hasCreatorTriple
  Types: RiC-Triple
  Facts:
    reifiesHasCreatorTripleFor R,
    rico:describesOrDescribed R

Hi again @williamsonrichard.

Just a quick answer for now.

First of all, I agree with you that an assertion (a triple in RDF) can be considered a rico:Record. This is about managing metadata on records, and metadata on records are records (or record parts possibly).

The relation classes in RiC-O are rather n-ary relations than reified assertions; also they describe facts (i.e. relations, that could also be considered events or activities), and have not been created to store metadata about the description of the facts (who asserted this, when, etc.), though it would be possible to extend the model to do so (they only have for now a rico:certainty and rico:source property - I must check this.)

I like your second proposal above, as you said it is simpler and more elegant. I have a question (rather related to your first proposal): why not use rdf:Statement? you could simply define the RiC-Triple class as being also a subclass of rdf:Statement. This would enable to use the properties of this class, instead of defining new RiC-O properties for storing the subject and object of the triple.

The main problem, among others, with reification - I am sure you know it of course! - is that you get a huge amount of supplementary triples that you have to store and manage.

Did you consider using RDF Star for this? It is really smart, concise, and elegant. It is now implemented in several frameworks and tools (see https://w3c.github.io/rdf-star/implementations.html) like GraphDB, though for now there is not, as far a I know, any official recommendation for RDFS Star, nor OWL Star.

As concerns RiC-O, we can discuss, within the RiC-O development team, the idea of adding a RiC-assertion or RiC-triple class, subclass of Record and of rdfs:Statement. I am not sure, if we do so, that we will go further in the 1.0 version.

williamsonrichard commented 1 year ago

Hi @florenceclavaud, thank you very much for your thoughts! Lots to think about! Just to reply to the simpler matter first:

A few more words about this: "As a small side-remark, it would be great if 'History' could be given a class/object property, and not only be a datatype property." You can use rico:Event in fact and replace (when possible) the history datatype property by a series of Events, using rico:isOrwasAffectedBy object property, to connect any Thing (thus of course a Record Resource or an Agent or Place) and the instances of Event.

I completely agree that it is good to use rico:Event in this way, but rico:affectsOrAffected is very general, and I think there is definitely room in RiC-O for something which corresponds a little more closely to History in RiC-CM (if not, I think one could probably argue that there is not much point in having History as a separate attribute in RiC-CM at all). It might be enough to just introduce a sub-property of rico:affectsOrAffected, e.g. something like the following (probably with a better name)?

ObjectProperty: isEventInHistoryOf
SubPropertyOf: rico:affectsOrAffected

Possibly one might wish to make the domain not just rico:Event but rico:Event, rico:isAssociatedWithDate some rico:Date if one wishes to formally say that a historical event entails a date.

Hi @williamsonrichard, creating such a subproperty is a good idea IMHO at first glance. Also, conceptually speaking, it would allow to distinguish past events from current activities.

However, I think that we cannot get rid of the rico:history datatype property. We have a huge quantity of archival metadata that contain textual historical discourse about the entities it concerns, e.g. biographies or history of agents, custodial history of records, history of their management (appraisal, arrangement etc.). We need to enable implementers to move to RiC, thus convert this metadata to RiC-O, without losing such textual information, that may often be of significant importance for users (often they will not find such discourse elsewhere; this is true for record resources, and also for our authority records on agents, which are far less numerous and less indexed than the authority records managed by librarians, but often have much more textual content). In such projects, the staff involved may have no means to structure this textual discourse into a series of events with dates, descriptions, agents and places. At least no means for a while. This may involve NLP and AI in fact. Le me also say that for now, I am not fully convinced that such long, complex, nuanced, historical discourse on an entity (in fact several entities related to each other) can for now be fully represented by a series of assertions, even reified (in RDF) or documented by any other method ;-). To be more precise, I would be happy to work on such a project, and then see what a well trained AI (thus from a good training corpus) and appropriate (thus huge and very rich) models and knowledge bases could do. Supposing that this exists or can be developed soon (which would be great!) and that the team involved has the time, human resources and budget to apply such methods systematically, I would be happy to keep the old discourse - just as evidence of the previous form of the history. Sor for now, I would say that we need both methods (rico:history and rico:Event). Not to mention that they can be used in combination, for example if you decide that you will rather try, a a first step, to annotate, classify and identify the named entities (agents, places, etc.) mentioned in such a discourse. And why not, in the other direction, thinking of AI generating a historical discourse from a series of events. Sorry to be chatty!

Hi @florenceclavaud, thank you for the nice reply! No problem about the chattiness, I found what you wrote very interesting! I completely agree about keeping rico:history and did not mean to suggest removing it; just if we can have the sub-property in addition that would be great :-).

I like very much what you wrote about complex historical discourse; in general I think the extraction of structure from free-text is a most interesting (if challenging!) project, and one which has the potential to elevate archival metadata to a new level of usefulness. I too think that AI may well have a role to play, and I agree completely that we would in any case wish to keep the original. I would just remark that making use of rico:Event does not prevent this: one can perfectly well have a datatype property with domain a rico:Event where this free-text is kept; but having an individual rather than a string literal allows one to do other things in addition.

williamsonrichard commented 1 year ago

I will reply inline, below, to your other comment @florenceclavaud, thank you very much for your reply!

Hi again @williamsonrichard.

Just a quick answer for now.

First of all, I agree with you that an assertion (a triple in RDF) can be considered a rico:Record. This is about managing metadata on records, and metadata on records are records (or record parts possibly).

Great!

The relation classes in RiC-O are rather n-ary relations than reified assertions; also they describe facts (i.e. relations, that could also be considered events or activities), and have not been created to store metadata about the description of the facts (who asserted this, when, etc.), though it would be possible to extend the model to do so (they only have for now a rico:certainty and rico:source property - I must check this.)

I agree completely with everything you write after the semi-colon; regarding what is before the semi-colon, I agree that the relation classes are not strictly speaking reifications in the RDF sense, but I do think this is a form of reification in the general semantic sense, i.e. lifting relations up to classes/individuals. This is just a matter of nomenclature, though :-) (see before 'Use case 1' here for a discussion of this exact terminological matter!).

I like your second proposal above, as you said it is simpler and more elegant.

Great! Unfortunately we noticed shortly after I posted this that it does not quite model triples, it misses the specification of the object of the triple; but in some cases this might be a feature rather than a bug.

I have a question (rather related to your first proposal): why not use rdf:Statement? you could simply define the RiC-Triple class as being also a subclass of rdf:Statement. This would enable to use the properties of this class, instead of defining new RiC-O properties for storing the subject and object of the triple.

Thank you for the question and the suggestion! The main reason is that this takes us out of OWL-DL, so we cannot use a reasoner, etc. There are some cases in which I think OWL-DL is not permissive enough (for example, it should definitely be allowed in my opinion to make a composition of properties in which the final property is a datatype property, the result of the composition being a datatype property), but in this case I do agree with the judgement of those who created with OWL-DL: it is not clear what the semantics of rdf:Statement are, for instance.

The main problem, among others, with reification - I am sure you know it of course! - is that you get a huge amount of supplementary triples that you have to store and manage.

This is an interesting matter! Though we are still discussing this in our team here, I am personally of the opinion that actually it does not make all that much difference. OWL/RDF is very verbose anyhow; if one has a large amount of record sets and one makes use of a significant amount of RiC-O, one will already run into challenges with the amount of triples with regard to memory, etc, regardless of reification. Indeed, I think that one has to expect to distribute RDF triples horizontally in memory so that one can scale essentially infinitely, I doubt very much whether monolithic infrastructure is the way to do it. And if one can distribute horizontally, I don't think reifying makes a significant difference: I estimate that reifying might increase the total number of triples by a factor of three or four times; and whilst 100GB, say, is somewhat different to 25 GB, I don't think it fundamentally changes the design challenges. The same goes for the human aspect: I think the actual reification mechanics can be generated programmatically, it is only the actual entering of meta-meta-data that a human will do, and this one would need anyhow.

Did you consider using RDF Star for this? It is really smart, concise, and elegant. It is now implemented in several frameworks and tools (see https://w3c.github.io/rdf-star/implementations.html) like GraphDB, though for now there is not, as far a I know, any official recommendation for RDFS Star, nor OWL Star.

Thank you very much for this! I looked closely into it after your comment. I definitely like RDF-Star, but I see it as 'syntactic sugar' rather than addressing the fundamental problem: one could not use it, at least as far as I understand, to say that a RiC-triple is itself a rico:Record, for instance. Its development is also in a little too early phase for us; we would need it to be supported by major libraries, say rdflib, which seems as though it is coming, but is not in place yet.

As concerns RiC-O, we can discuss, within the RiC-O development team, the idea of adding a RiC-assertion or RiC-triple class, subclass of Record and of rdfs:Statement. I am not sure, if we do so, that we will go further in the 1.0 version.

Great! I definitely agree that it sounds like it would be too much to try to do much about this in the 1.0 version. It is a complicated matter that I think more experience is needed on before there could be a consensus as to what is best. We are ourselves still discussing internally what to do; we have come a bit further than in my earlier comments, but I'll not go into details here to avoid clogging up the thread. I'd be happy to be involved in any future discussion of this, though. If anything at all is included in this direction in version 1.0, perhaps it could be something very minimal marked as 'experimental' (might be changed or removed in future versions), e.g. one might introduce a RiC-Triple or RiC-Property class which is a sub-class of rico:Record (or similar), and indicate that this is meant to be used for reification, without going into detail as to how to actually achieve that reification. This might at least contribute to users who have a need to reify have some sort of common language to employ. But I understand too if nothing at all is done in this direction in version 1.0 :-).