DINA-Web / dina-model-concepts

Repository containing information to define data model boundaries
MIT License
3 stars 0 forks source link

Track organism when preparation split one #25

Open cgendreau opened 5 years ago

cgendreau commented 5 years ago

When creating a new catalogued object(s) by applying a preparation, we need a way to record that 2 (or more) catalogued objects are from the same organism. The main reason is that Identification is done on Catalogued object so we need a mechanism to find all the other catalogued objects of the same organism. They will be linked to the same material sample but 1 material sample could have multiple organisms.

dshorthouse commented 3 years ago

There's an organismID in Darwin Core that can be used for this purpose. Tricky issue from user perspective is the (usually) complete absence of such an identifier in a parent MaterialSample until after a child MaterialSample is created; there may be no prior knowledge that there was more than one organism prior to a split.

dshorthouse commented 3 years ago

Every MaterialSample should have 1:many organisms, each of which with an optional organismID and each organism should have 1:many identifications. In the case of a single whole organism, having identifications nested within an organism is somewhat unwieldy, but is absolutely essential when there is more than one identified organism on/in a catalogued MaterialSample such that when/if a MaterialSample is sampled, the child would retain the same organismID. In the absence of such an identifier, there is otherwise no way to present the appropriate & correct identifications that may reside on the child MaterialSample to correspond to which of the potentially many known organisms in the parent MaterialSample. Having such an organismID would also be suitable for instances where the same tree is sampled time & again or a MaterialSample composed of many parts (= types) but are all in reference to the same organism (eg wolf with pelts, bones, organs all physically stored in different rooms or a plant on several different sheets).

cboelling commented 3 years ago

These use cases can be covered in the conceptual model (and mapped subsequently to a database logical model) by introducing the following relations. I use the dina: namespace to distinguish items from representational primitives with similar or identical labels in other epresentational schemes (the question of whether these are co-extensional or even co-intensional is one that should be revisited).

x1 instanceOf dina:MaterialSample x1 dina:hasPartDerivedFrom x2 x2 instanceOf dina:Organism x2 dina:isIdentifiedAs x3 x3 instanceOf dina:Taxon

Where x1, x2, x3 stand for instances of the respective classes, which can by characterized by the identifier and additional labels and properties of your choice. instanceOf stands for the foundational relation between a particular and the class it is assigned to.

This looks more complicated than it actually is (just having to represent an entity-relation-graph in markdown). The key is to connect instances of dina:MaterialSample with instances of dina:Organism through the property dina:hasPartDerivedFrom the semantics of which is, roughly, that some part (whole or proper) of the material sample is derived from some particular organism, which in turn can be asserted to be an instance of this or that taxon. The DerivedFrom part of the relation acknowledges the fact that the organism and the corresponding part of the material sample may be connected only through a succession of preparation processes which might have substantially altered the original physical composition of the organism or the part of it which has been preserved in the material sample. This representation is consistent with a more detailed representation of the processing steps.

In cases where a single organism is not readily identifiable, it should be possible to also link instances of dina:materialSample with taxa directly, e.g. by a relation dina:hasPartIdentifiedAs. This is consistent with all of the above (leaving out the organism-man-in-the-middle; in fact, the above design pattern would give rise to a corresponding relation between the material sample and the taxon). Of course, one could always mint artificial organism identifiers which function solely as connections (but this might not always be desirable).

I'd add this to the list of things to consider for consolidating the model (including proper definitions etc).

dshorthouse commented 3 years ago

The DerivedFrom part of the relation acknowledges the fact that the organism and the corresponding part of the material sample may be connected only through a succession of preparation processes which might have substantially altered the original physical composition of the organism or the part of it which has been preserved in the material sample.

Would that mean then that a CollectingEvent be considered a preparation process? What I'm getting at is the situation where a progenitor MaterialSample already has more than a single Organism, such as a piece of tree bark with 2+ identifiable lichen species or a bee with a Strepsiptera in its tergites, both of which collected/observed/recorded as such prior to any preparation process that (may have) later altered the MaterialSample in any material way.

cboelling commented 3 years ago

I would be comfortable with a view that regards any interaction between agent and object as a dina:PreparationProcess. Even if that interaction just consists of observing the object unaided (e.g., with your eyes) and producing a record of the agents' perceptions/conclusions (in a notebook, a database or in some other form). (May be the label dina:intervention would be an alternative for this category then?)

This approach avoids inventing any rigid distinctions between preparation processes and other interventions in what I think is actually a continuum of more or less invasive interventions changing the physical properties of the object more or less severely.

This does not prevent us from defining specific sub-classes of dina:PreparationProcess if that is needed for some use case. dina: Observation could be such a class, defined (top of my hat definition) as a dina:PreparationProcess which upon completion finds the object of inquiry in the (principally) same condition (state, place) as at the beginning of the process (e.g. bark observed and left in peace, bee released again unharmed together with its parasites).

A dina:CollectingProcesscould, conceptually, be thought of a dina:PreparationProcess that is by virtue of the object of inquiry not related to any temporally prior dina:PreparationProcess.

Similarly, I would be comfortable with a view, where the DerivedFrom bit as part of a dina:hasPartDerivedFrom relation is interpreted in the broadest sense, i.e this thing now is derived from this thing then (even if nothing else happened - the null derivate. As above, this could be helpful in circumventing rigid distinctions, and all the edge case problems that come with them, where there is actually a continuum, notwithstanding the creation of defined classes of derivates if a use case benefits from those.