Semantic-Data-for-Humanities / SDHSS-modelling-best-practices

This repository is dedicated to sharing modelling issues, discussions, solutions and the definition of best practices
Creative Commons Attribution Share Alike 4.0 International
2 stars 0 forks source link

Modelisation of Source/Document #2

Open stephenhart8 opened 9 months ago

stephenhart8 commented 9 months ago

Meeting notes where this issue was formulated: https://kleiolab.atlassian.net/wiki/spaces/EC/pages/3514564624/Discussion+du+27.09.2023

The management of sources in the Toolbox needs improvement.

Change the label “Sources” to “Documents” in the toolbox Because FRBR is no longer maintained and was replaced by LRMoo, we need to adapt the ontology. There are 3 options: Stop using FRBR and instead use SDHSS classes, incorporate FRBRR in SDHSS, or use LRMoo Should sources also document other than bibliographical entities, such as artwork, casted sculptures that fit the FRBR logic? Build a vocabulary for the bibliographical references

stephenhart8 commented 9 months ago

Cela a été discuté avec @valamercery le 12 octobre 2023: https://kleiolab.atlassian.net/wiki/spaces/DT/pages/3473539073/Data+Team+Meeting+Notes

En bref: Proposition de migrer à LRMoo Poursuivre la discussion sur les documents numériques: Manifestations et/ou Web Request?

stephenhart8 commented 9 months ago

Les discussions ont été poursuivies le 27 octobre 2023, en voici un résumé rédigé par FB (disponible aussi ici https://kleiolab.atlassian.net/wiki/spaces/EC/pages/3159228441/Transformation+de+la+gestion+de+la+bibliographie):

Les échanges des mois passés, et l'évolution du projet Geovistory, montrent qu’une révision de ce domaine est urgente et elle présuppose une analyse sémantique et fondationnelle préalable.

Et une réflexion plus approfondie sur les référentiels existants.

Les questions centrales à envisager sont:

la nature exacte de l’information qu’on souhaite traiter:

de quels objets parle-t-on ? Textes manuscrits, livres, références bibliographiques, monnaies, tableaux, statues sculptées, statues fondues, inscriptions sur pierre, inscriptions votives manuscrites sur objets, notes musicales, tableaux comme la Joconde, photos, sites web, … quels référentiels faut-il prendre en considération:

CIDOC CRM FRBR/FRBRoo LRM/LRMoo RiCO … comment intégrer les conceptualisations issues de ces différents référentiels, à partir de leurs domaines de discours respectifs?

sont-elles compatibles? quelle analyse fondationnelle en proposer et avec quels outils? proposer classes et propriétés équivalentes dans l’ecosystème SDHSS et mapper ? choisir un référentiel, e.g. LRMoo, et l’intégrer dans SDHSS comme nous faisons avec le CRM tout en modifiant quelque peu la sémantique de certaines classes, notamment lrm:F3, par le méchanisme du double héritage, tout en expliquant dans des notes additionnelles? comment faire évoluer les profils et l’implémentation dans Geovistory pour que les utilisatrices et utilisateurs puissent se retrouver et le travail de saisie de la documentation soit facilité?

en particulier, comment éviter des conceptualisations parallèles des mêmes objets? comment distinguer les secteurs: documentation (bibliographie et sites web) ; sources uniques manuscrites, inscriptions, monnaies, etc. ; objets artistiques: statues, tableaux, photos … comment articuler production sérielle, production unique, oeuvre, objets linguistiques/expressions?

stephenhart8 commented 9 months ago

Mes quelques réflexions de ces derniers jours (un peu dans le désordre):

La question étant urgente (les besoins se font sentir), il est nécessaire de convenir d'une solution la moins contraignante et la plus ouverte possible.

Cette solution me semble être d'élaborer nos propres classes dans SDHSS et de recourir aux propriétés owl:sameAs Cela permet aussi de s'inscrire dans notre propre domaine, à cheval avec la bibliothéconomie, les archives, l'histoire, etc. (certaines classes SDHSS se retrouverait ainsi équivalente à plusieurs classes dans LRM et RICO, par exemple) La structure de FRBR/LRM (la distinction entre Work, Expression, Manifestation, Item) convient aux besoins de Geovistory, il s'agirait donc de reprendre cette structure là.

L'enjeux de la conceptualisation parallèle de mêmes objets est dû aux différents domaines que couvre Geovistory. Un même objet (un papyrus) peut être considéré comme un crm:E24 Physical Human-Made Thing s'il est documenté dans un contexte de fouille, mais comme un lrm:F5 Item dans LRM ou frbroo:F4 Manifestation Singleton dans FRBRoo (et probablement une autre classe dans RICO).

Il faudrait selon moi identifier ces objets selon une même classe (donc la plus haute crm:E24 Physical Human-Made Thing, mais peut-être définir des Kinds, si c'est nécessaire) Cela impact aussi la relation entre objet (support) et texte. Faut-il simplement documenter cela à travers la propriété crm:P128 carries? Des objets portant des inscriptions seraient donc documentés de la même manière que des ouvrages imprimés ou des singletons, il s'agit à chaque fois d'objets physiques portant des information objects. La question se pose maintenant de ne se reposer que sur les classes déjà existantes (E89, E73, etc.) et d'utiliser des Kinds, ou de créer des sous-classes.

Si l'on souhaite faire des équivalences, il me semble nécessaire de créer des sous-classes. J'y étais opposé dans un soucis d'interopérabilité: si ces classes sont vraiment équivalentes, alors autant utiliser les standards plutôt que de créer des classes. Mais si l'on fait des équivalences avec plusieurs ontologies (LRM, RICO, etc.), alors l'interopérabilité ne devient-elle pas plus grande? Voici, en guise de conclusion, mes réflexions sous forme de diagramme (je réfléchis mieux avec des images), en cours d'élaboration. Il est donc préférable de suivre ce lien: https://drive.google.com/file/d/1bfUQ3U3YluYqgV1hexpLYu-0uM5bEwtX/view?usp=sharing, mais voici aussi une impression de mes réflexions en cours:

GV_Sources drawio

stephenhart8 commented 9 months ago

Food for thought, now that the RiC-O 1.0 has been published, it would be necessary to include this ontology in the discussion: https://github.com/ICA-EGAD/RiC-O/tree/master/ontology/current-version

The option of staying in SDHSS would allow both the connection to FRBR/LRM and RicO

stephenhart8 commented 9 months ago

My two cents on the RiC ontology, and how things influence our discussions on the topic.

First, because in GV we document archives as well as books in Sources, it is as important to have an ontological analysis of both LRM and RiCO.

Here is the overview of RiC model:

Capture d’écran 2023-11-20 à 15 52 49

Two elements are essential for our discussions:

The scope note of RiC-E04 Record is as follows:

Discrete information content formed and inscribed, at least once, by any method on any carrier in any persistent, recoverable form by an agent in the course of life or work activity.

This class makes me think that it could be related to the CRM E73 Information Object, if we look at its scope note:

This class comprises identifiable immaterial items, such as a poems, jokes, data sets, images, texts, multimedia objects, procedural prescriptions, computer program code, algorithm or mathematical formulae, that have an objectively recognizable structure and are documented as single units.

But then what about the class RiC-E06 Instantiation:

The inscription of information made by an agent on a carrier in any persistent, recoverable form as a means of communicating information through time and space.

This class seems to be different from the class E73 Information Object, as the inscription of the information is important. If we look at some RiCO examples:

Wax seal carrying an impression of the 3rd Great Seal of King Charles I [en] [analogue instantiation of record part]

It seems that this RiC-E06 Instantiation is not just the carrier, the information on a specific carrier. Could that be mapped with my proposition above? I don't think so.

Am I mistaken?

stephenhart8 commented 9 months ago

Some additions to my thoughts

One of my major issues concerns the class FRBR/LRMoo Item (and I agree that I'm a bit biased by my archaeological background, because this is not a central entity of the ontology). My concern is about the fact that a user could both document a book as an instance of E24 Physical Human-Made Thing (it was decided in GV that all objects would be documented by this class and not a subclass such as E21 Human-Made Object) and an instance of frbr:F5 Item.

I think it is an issue

My worries concern mostly some specific cases of objects, such as fragments of papyri, coins, and other objects that can be considered as instances of Item by some researchers in the context of documentation, but also as instances of E24 Physical Human-Made Thing if those entities are documented in other contexts, such as archaeology. If my concerns are valid, this means that no new class equivalent of frbr:F5 Item should be created, and that we should instead only use E24 Physical Human-Made Thing. This means that a new sdh:CX Manifestation class would be exemplified by an instance of E24 Physical Human-Made Thing. This means that any object can exemplify a manifestation. This solves my issue, but is it not creating other problems? Can a vase, a sculpture, or a tool be an example of a manifestation?

My reasoning is, I think, also in line with the decision of limiting the number of subclasses of E24 Physical Human-Made Thing (so not creating subclasses such as "Ship", "Vase", etc.)

But is it really an issue?

The strength of RDF and class hierarchy is that any instance of a class is also an instance of its superclass. This means that an instance of sdh:CX Item would also be an instance of E24 Physical Human-Made Thing. So semantically and ontologically, there is no difference.

The problem arises when creating profiles. Because OntoMe and Geovistory do not manage the inheritance of properties for profiles, and the profile describing the production of a E24 Physical Human-Made Thing cannot be used to describe the production of an Item, and an additional profile should be created.

stephenhart8 commented 6 months ago

From the various discussions on the subject, I propose to create the following classes, based on this final diagram:

GV_Sources drawio (1)

I suggest that for the moment, we don't create a Serial Item class, to avoid the issue of having similar objects documented in different classes. If we decide to create this Serial Item class (which has also the effect of needing new profiles for the documentation of those serial items, such as production, location, etc.), the property sdh:PX belongs to serial product set (contains serial product item) should have as domain Serial Item.

I will propose scope notes in the following messages.

stephenhart8 commented 6 months ago

sdh-int:CX Work

This class, equivalent to the LRM-E2 Work class, comprises the intellectual or artistic content of a distinct creation.

SH comment: Should we document more the scope note, or simply rely on the description of LRM?

sdh-int:CX Expression

This class, equivalent to the LRM-E3 Expression class, comprises a combination of signs that is the intellectual or artistic realization of works in the form of identifiable immaterial objects.

SH comment: Should we document more the scope note, or simply rely on the description of LRM?

sdh-int:CX Manifestation

This class, equivalent to the LRM-E4 Manifestation class, comprises the set of all physical human-made things (SH comment: or Serial Product Item if we create the class) that carry the same content of one or multiple expressions and are produced according to processes based on an identifiable model or matrix, thus sharing the same physical characteristics.

An expression that has been reproduced manually on multiple carriers, such as medieval manuscripts or copies of paintings, share the same work but not the same expression. Therefore, handwritten copies of a manuscript do not belong to the same manifestation.

sdh-int:CX Manifestation Type

This class comprises types that stand as the models for instances of sdh-int:Cx Manifestation that are produced as the result of production activities using plans exact enough to result in one or more series of uniform, functionally and aesthetically identical and interchangeable items of the same expression. The manifestation type is the intended ideal form of the manufactured process.

sdh:CX Serial Product Set

This class comprises the set of all physical human-made things that have been serially produced according to processes based on an identifiable model or matrix. This includes equipment produced by machines based on a plan, struck coins, printed books, or, among others, casted sculptures. Each item is therefore identical to others of the set, even though there are ways to distinguish specific ones, such as the signing of a printed book.

Items that have been manually produced, even with serial processes, cannot be documented with the class sdh:CX Serial Product Set, because even if each item shares very similar traits, differences can be identified that, on a large time-scale, lead to typological differences. Typologies based on those manually produced objects do not rely on a model or matrix but on epistemic classification of objects by researchers, such as ceramics.

The model of this serial product set is documented by the class CRM:E99 Product Type.

stephenhart8 commented 3 months ago

After thoughts and discussions, here is the updated diagram of the propositon:

GV_Sources-Page-3 drawio

Classes and properties in bold need to be created Comments in red are general comments Comments in blue are previous frbroo classes and properties to change.

stephenhart8 commented 3 months ago

Here are the updated and proposed scope notes:

sdh-int:CX Document Content

This class comprises a combination of signs (of any form or nature) that is the full content expressed on a physical thing or a manifestation and intended to convey intellectual or artistic content. The term “sign” is intended here in the meaning used in semiotics. The Document Content is an identifiable immaterial object, distinguished from the material support or the set of supports that it is carried on, but that could not exist without it. A document content can be expressed on multiple carriers (such as the case of printed materials), but any change in form (e.g. a manuscript copy that includes small errors, from alpha-numeric notation to spoken words, a poem created in capitals and rendered in lower case) is a new instance of Document Content.

Question: If the identity of the Document Content is that it's what is on a physical object or a manifestation, what happens is a part of the physical object is cut? Such as a single page in a book. Aren't we in a similar position as the head of a sculpture that is a feature until it becomes a new object? Question: Similarily, is the identity of the Document Content is that it's what is on a physical object or a manifestation, what about a letter that contains many single pages?

sdh-int:CX Content Section

This class comprises a combination of signs (of any form or nature) intended to convey intellectual or artistic content that is only a small portion of an instance of Document Content that is contained on a physical thing or a manifestation. The term “sign” is intended here in the meaning used in semiotics. An instance of Content Section cannot exist without the instance of Document Content in this parthood relationship. The identity of the section is provided by the definition adopted by the user in order to cut it out.

If Content Section is split on different carriers (e.g. the first part of the copy of a letter is found in the end of volume I and the second part of the copy in the beginning of volume II) then the portion is related to both volumes as carriers of the one relevant Content Section. In this case the Content Section will be associated as part of two different Document Content instances.

stephenhart8 commented 3 months ago

Suite aux discussions avec FB:

Réponses aux questions plus haut:

Notes des discussions