Closed arnikz closed 3 years ago
See this table on pg. 3:
There are three CSValues associated with latin text per line. According to doc, the verbatim text and the language should be left out. Currently, the dcterms:language
is therefore set to iso:und
but it should be iso:lat
, I think, if the rdf:value
refers to (non-empty) string or markdown.
For example, see the annotated bodies incl.
dwc:MeasurementOrFact
andoa:TextualBody
. Adddcmitype:Dataset
to reference a list or a table. As such, one could re-type the list/table inmarkdown
(asrdf:value
) and setdcterms:format
totext/(x-)markdown
MIME type foroa:TextualBody
.Moreover, change
nc:humanObservation
todwc:HumanObservation
class. For time being, instances of this class could be BNodes instead of IRIs.@lisestork: What do you think?
Agreed, I think it would be a good idea to add the class dcmitype:Dataset
to the measurement instance (in case a table is annotated), to allow for the possibility to transform tables into markdown tables.
Re-reading the darwin core documentation dwc:HumanObservation
, I am thinking that instead of nc:humanObservation
, we could add the URI of the image from which the table is derived with type dwc:HumanObservation
and dsw:Token
. What do you think? In any case I agree that (atm) no new URI has to be minted for the instance. For the annotation of a property or attribute, example-5_2, I think the measurementorfact instance serves as a connecting instance and can be a blank node.
I am thinking that instead of nc:humanObservation, we could add the URI of the image from which the table is derived with type dwc:HumanObservation and dsw:Token. What do you think?
I assume, you refer to the nc:measurementOrFact1
instance of nc:humanObservation
or dwc:MeasurementOrFact
. Do you mean the latter should be replaced with dsw:Token
(i.e., a form of evidence derived from a dwc:Organism
) and/or it's sub-class dwc:HumanObservation
?
I think, adding the image IRI would be redundant - it's also an instance of oa:Source
. Moreover, it seems inconsistent with the Web Annotation Data Model (i.e., connected to oa:hasBody
vs. oa:hasSource
/oa:hasSelector
properties).
The use of dsw:hasDerivated
predicate in this example is still unclear to me. According to the definition: Links a subject Organism or Token instance to an object Token instance.
As I read it, the instances of the class dwc:Token
should be evidence about the organism (derived from the organism) that lead to the identification, such as the URI for the physical specimen (dwc:Specimen
) or an observation such as a digitised field note (dwc:HumanObservation
). In our case this is a digitised field note, which (in my opinion) could be an instance of both classes (dwc:Token
and dwc:HumanObservation
).
More fine-grained evidence about the organism identification could be derived from the field note, such as the information contained in a table (e.g., containing dental measures) or a paragraph (containing a description of a part of the organism's anatomy). Within a field note, such a table or paragraph would be an instance of dwc:MeasurementOrFact
. The measurement or fact is therefore connected to the instance of dwc:HumanObservation
with the property dsw:derivedFrom
, of which dsw:hasDerivative
is the inverse property.
I agree that adding the image id as an instance of the classes dwc:Token
and dwc:HumanObservation
would be redundant as we already refer to the bounding boxes in the image with the annotation model, but it might be good to discuss the advantages/disadvantages of doing so?
I think, adding the image IRI would be redundant - it's also an instance of
oa:Source
. Moreover, it seems inconsistent with the Web Annotation Data Model (i.e., connected tooa:hasBody
vs.oa:hasSource
/oa:hasSelector
properties).
I'm not so sure about this, as oa:hasBody would not be connected to the instance of the dwc:HumanObservation
as it is not a manual annotation. In my opinion, the conceptual entities (humanobservation, organism, occurrence, identification) are not connected to instances of the annotation class, but maybe this is something we should revisit. One thing that would be problematic is the annotation of these conceptual entities with properties from the darwin core, such as dwc:organismRemarks
, as it would not be possible to track provenance of these annotations.
In our case this is a digitised field note, which (in my opinion) could be an instance of both classes (dwc:Token and dwc:HumanObservation).
dsw:Token
- A form of evidence derived from a dwc:Organism.
dwc:HumanObservation
- An output of a human observation process. Evidence of an Occurrence taken from field notes or literature.
I agree, however, the latter instance shouldn't be then connected via oa:hasBody
to the instance of oa:Annotation
but rather to the image IRI (e.g., img:MMNAT01_AF_NNM001001033_003.jpg
a dcmitype:StillImage
) via rdf:type
. In addition, the dwc:MeasurementOrFact
instance, which is dsw:derivedFrom
the dwc:HumanObservation
instance, seems equivocal to oa:TextualBody
instance (BNode).
About dsw:Token
: DSW recognizes organism-related evidence by explicitly defining the class dsw:Token. A Token is essentially a physical, digital, or conceptual voucher that provides some kind of evidence about an Organism. A Token may be part of the Organism or the Organism itself in living or preserved form (i.e. as a specimen). It may
also be an image, sound, sample, DNA sequence, or human or machine observation.
Both dsw:Token
and dwc:HumanObservation
classes should be also added to the image IRIs (field notes) of other examples (TYPE annotations).
In example 5_1 the dwc:measurementType
corresponds to ncit:C37927
(Color), however, one can't include its dwc:measurementValue
(e.g., white) using the current input form. As such, a value requires another bounding box, followed by a link to the type annotation. @lisestork: What do you think?
In example 7_1 the dwc:verbatimEventDate
refers to 10 April 1821
, and as such it doesn't follow any formatting standard. Perhaps, there should be an additional input field (e.g., using dcterms:W3CDF
or dwc:eventDate
) to address this. @lisestork: Moreover, it would make the example 7_2 obsolete. Do you agree?
@lisestork: Currently, we use the dwc:verbatim*
predicates to map the verbatim input field(s). Using the predicates indicated here would require additional fields in the input form, however, I'm not so sure about the value of this addition given the examples.
dwc:eventRemarks
Output: https://github.com/LINNAE-project/SFB-Annotator/blob/670a89549c1600876e6f3f3f771476df92e8acb0/data/rdf/local/example_7_1.ttl#L22 https://github.com/LINNAE-project/SFB-Annotator/blob/670a89549c1600876e6f3f3f771476df92e8acb0/data/rdf/local/example_7_1.ttl#L45
dwc:locationRemarks
Input: https://github.com/LINNAE-project/SFB-Annotator/blob/670a89549c1600876e6f3f3f771476df92e8acb0/data/json/local/example_3_1.json#L21 https://github.com/LINNAE-project/SFB-Annotator/blob/670a89549c1600876e6f3f3f771476df92e8acb0/data/json/local/example_3_1.json#L25
In example 5_1 the
dwc:measurementType
corresponds toncit:C37927
(Color), however, one can't include itsdwc:measurementValue
(e.g., white) using the current input form. As such, a value requires another bounding box, followed by a link to the type annotation. @lisestork: What do you think?
@arnikz I agree. I haven't looked into the annotation of values yet however, as these are mostly in free text (see example below). On a similar note: we could maybe use the Phenotype and Trait Ontology (PATO) for both measurement types and measurement values. It for instance has the qualities ( such as color) and values (subclasses) for the qualities (such as brown).
@arnikz I'm not sure how these would be replaced by the dwc:verbatim property
, as I think the latter should be used for the annotation of a bounding box, whereas e.g.,, dwc:eventRemarks
is used to annotate a linking (conceptual) entity that is not used during an annotation of a bounding box. I think these predicates should be used to include free text about an organism, occurrence, identification, location. For instance for the location, the presence of certain flora or fauna, or other natural processes present at the location and discussed in free text in the field note. As you suggested earlier, this could be a verbatim description which could be translated on the fly. However, a downside is that for use of automated methods (such as HTR) a link between a transcription and the pixels is required.
I'm not sure how these would be replaced by the
dwc:verbatim property
, as I think the latter should be used for the annotation of a bounding box,
Actually, the dwc:verbatim*
properties are already used in the RDF graphs (see above) so there is no need to replace etc.
whereas e.g.,,
dwc:eventRemarks
is used to annotate a linking (conceptual) entity that is not used during an annotation of a bounding box.
The question was if adding input fields + dwc:*Remarks
properties are that useful in addition to the verbatim field + dwc:verbatim*
predicates.
As you suggested earlier, this could be a verbatim description which could be translated on the fly.
Yes, assuming that you referred to dwc:verbatimEventDate
, there is no need to add an extra field if the process is automated by parsing date(time).
@arnikz I agree. I haven't looked into the annotation of values yet however, as these are mostly in free text (see example below). On a similar note: we could maybe use the Phenotype and Trait Ontology (PATO) for both measurement types and measurement values. It for instance has the qualities ( such as color) and values (subclasses) for the qualities (such as brown).
Indeed, PATO could also be used for both dwc:measurementType
and dwc:measurementValue
. Could you indicate the text in the screenshot where the latter (brown) is to be found?
See this table on pg. 3:
There are three CSValues associated with latin text per line. According to doc, the verbatim text and the language should be left out. Currently, the
dcterms:language
is therefore set toiso:und
but it should beiso:lat
, I think, if therdf:value
refers to (non-empty) string or markdown.
Longitud | To(?)a | - - - | 1, 0, 3 |
---|---|---|---|
Corporis | - - - | 0, 2, 9 | |
To(?)a | - - - | 0, 2, 9 | |
Capitis | - - - | 0, 1, 1 | |
Caudae | - - - | 0, 0, 1 | |
Digit. Medii | - - - | 0, 3, 0 |
Added DCMI classes in addition to foaf:Person
, dwc:Location
and dwc:Event
.
For example, see the annotated bodies incl.
dwc:MeasurementOrFact
andoa:TextualBody
. Adddcmitype:Dataset
to reference a list or a table. As such, one could re-type the list/table inmarkdown
(asrdf:value
) and setdcterms:format
totext/(x-)markdown
MIME type foroa:TextualBody
.Moreover, change
nc:humanObservation
todwc:HumanObservation
class. For time being, instances of this class could be BNodes instead of IRIs.@lisestork: What do you think?