Open ManonGros opened 1 month ago
@ManonGros,
The SYNTHESYS+ report (https://doi.org/10.3897/rio.10.e118244) you mentioned was the foundation of the work we're doing around our data modelling. However, our schema has evolved a bit since then. I recommend looking at the schema page, particularly the Digital Specimen json schema.
We've changed the structure by adding three main categories:
topicOrigin topicDomain topicDiscipline
In the JSON structure, we've added "enum" (enumeration) -- to use as predefined list of acceptable values. Until we have a proper vocabulary server, this approach helps us maintain consistency in how data is categorised.
I think the "other" category is still needed to capture the rest. We are calling this Other Biodiversity and Other Geodiversity.
ods:topicOrigin": {
"type": "string",
"description": "Highest-level terms identifying the fundamentals of the activities, in which context the objects in the collection were collected",
"enum": [
"Natural",
"Human-made",
"Mixed origin",
"Unclassified"
],
"examples": [
"Natural"
]
},
"ods:topicDomain": {
"type": "string",
"description": "High-level terms providing general domain information with which the objects are associated",
"enum": [
"Life",
"Environment",
"Earth System",
"Extraterrestrial",
"Cultural Artefacts",
"Archive Material",
"Unclassified"
],
"examples": [
"Life"
]
},
"ods:topicDiscipline": {
"type": "string",
"description": "Overarching classification of the scientific discipline to which the objects within the collection belong or are related",
"enum": [
"Anthropology",
"Botany",
"Astrogeology",
"Geology",
"Microbiology",
"Palaeontology",
"Zoology",
"Ecology",
"Other Biodiversity",
"Other Geodiversity",
"Unclassified"
],
"examples": [
"Botany"
]
},
Thanks @sharifX for letting me know, I wasn't aware of that.
Does it mean that the categories like Algae
, Bryophytes
, Fungi/Lichens (including Myxomycetes)
, Pteridophytes
and Seed plants
are no longer part of any controlled vocabulary?
If it the type of classification I have seen before in several institutions and I think it would be great to make them searchable (things like Algae collections cannot be searched easily otherwise). Will you use some other controlled value to work with these cases?
(I am trying to make these traditional collections more easily discoverable and I am not sure how best to proceed).
@ManonGros yes, good point. We have it now inside our FDO profile (json) under topicCategory. We will update the list in the digital specimen json schema so the profile and the object data schemas are aligned.
Thanks @sharifX I think the topicCategory
content is what I would like to integrate to the objectClassificationName
, it would be quite helpful to have these to help normalise the (sub-)collection content. Does that make sense?
@ManonGros Yes. that make sense.
Perhaps good to have this historical context documented as well. This is a sort of condensed historical context and background.
The development of objectClassificationName
builds on work from the SYNTHESYS+ project, which initially explored collection classification schemes in natural sciences, proposing structured categories such as Discipline (e.g., Botany) and Category (e.g., Algae, Bryophytes). See https://doi.org/10.3897/rio.10.e118244.
Latimer Core formalised these concepts into objectClassificationName
(http://rs.tdwg.org/ltc/terms/objectClassificationName). However, Latimer Core does not enforce a controlled vocabulary, allowing flexibility in naming.
The DiSSCo initiative retained the "Discipline" concept in its openDS schema (https://schemas.dissco.tech/), creating terms such as:
topicOrigin
, topicDomain
, topicDiscipline
To enhance consistency, DiSSCo plans to enforce values through a JSON enum (a controlled vocabulary constraint in the JSON schema) and leverage vocabularies from resources like GBIF and Catalogue of Life. DiSSCo will also establish a vocabulary server to manage terms specific to openDS use cases.
I would like to have a controlled vocabulary for interpreting the Latimer Core field
objectClassificationName
: https://ltc.tdwg.org/quick-reference/#ObjectClassification.objectClassificationName.The Latimer core term objectClassificationName is very convenient to describe subsets of collections that do not necessarily have other ways of being grouped. For example, this is helpful for groups of non-monophyletic taxa (for example Algae).
Currently we don't have any vocabulary but it would make sense to integrate the categories of the DISSCO discipline vocabulary which is described here: DOI 10.3897/rio.10.e118244
Human Biology
Archaeology
Other
Algae
Bryophytes
Fungi/Lichens (including Myxomycetes)
Pteridophytes
Seed plants
Collected on Earth
Collected in space
Other
Mineralogy
Petrology
Loose sediment
Other
Bacteria and Archaea
Phages
Plasmids
ProtozoaVirus - animal / human
Virus - plant
Yeast and fungi
Other
Botany & Mycology
Invertebrates
VertebratesTrace
fossils MicrofossilsOtherArthropods - insects (Lepidoptera, Diptera, Hymenoptera, Coleoptera)
Arthropods - other insects
Arthropods - arachnids
Arthropods - crustaceans & myriapods
Porifera (sponges)
Mollusca (bivalves, gastropods, cephalopods)
Other
Fishes
Amphibians
Reptiles
Birds
Mammals
Other
Other biological or geological objects which fit into none of the other defined categories
Note that there is some overlap with the GRSciColl discipline vocabulary for institution (https://registry.gbif.org/vocabulary/Discipline) and the GRSciColl collection content type vocabulary (https://registry.gbif.org/vocabulary/CollectionContentType/concepts). However, I think the DISSCO list of proposed values seems quite practical and reflects a lot of the sub-collection divisions I have encountered.
I am not necessarily suggesting that the DISSCO vocabulary be the final one used for the objectClassificationName but that it be integrated in the vocabulary used for interpretation of the field. Perhaps we could remove the "other" categories there?