gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
6 stars 1 forks source link

GRSciColl - collection descriptors - vocabulary for objectClassificationName #157

Open ManonGros opened 1 month ago

ManonGros commented 1 month ago

I would like to have a controlled vocabulary for interpreting the Latimer Core field objectClassificationName: https://ltc.tdwg.org/quick-reference/#ObjectClassification.objectClassificationName.

The Latimer core term objectClassificationName is very convenient to describe subsets of collections that do not necessarily have other ways of being grouped. For example, this is helpful for groups of non-monophyletic taxa (for example Algae).

Currently we don't have any vocabulary but it would make sense to integrate the categories of the DISSCO discipline vocabulary which is described here: DOI 10.3897/rio.10.e118244

discipline categories
Anthropology Human Biology Archaeology Other
Botany Algae Bryophytes Fungi/Lichens (including Myxomycetes) Pteridophytes Seed plants
Extraterrestrial Collected on Earth Collected in space Other
Geology Mineralogy Petrology Loose sediment Other
Microorganisms Bacteria and Archaea Phages Plasmids ProtozoaVirus - animal / human Virus - plant Yeast and fungi Other
Palaeontology Botany & Mycology Invertebrates VertebratesTrace fossils MicrofossilsOther
Zoology invertebrates Arthropods - insects (Lepidoptera, Diptera, Hymenoptera, Coleoptera) Arthropods - other insects Arthropods - arachnids Arthropods - crustaceans & myriapods Porifera (sponges) Mollusca (bivalves, gastropods, cephalopods) Other
Zoology Vertebrates Fishes Amphibians Reptiles Birds Mammals Other
Other Geo/Biodiversity Other biological or geological objects which fit into none of the other defined categories

Note that there is some overlap with the GRSciColl discipline vocabulary for institution (https://registry.gbif.org/vocabulary/Discipline) and the GRSciColl collection content type vocabulary (https://registry.gbif.org/vocabulary/CollectionContentType/concepts). However, I think the DISSCO list of proposed values seems quite practical and reflects a lot of the sub-collection divisions I have encountered.

I am not necessarily suggesting that the DISSCO vocabulary be the final one used for the objectClassificationName but that it be integrated in the vocabulary used for interpretation of the field. Perhaps we could remove the "other" categories there?

sharifX commented 1 month ago

@ManonGros,

The SYNTHESYS+ report (https://doi.org/10.3897/rio.10.e118244) you mentioned was the foundation of the work we're doing around our data modelling. However, our schema has evolved a bit since then. I recommend looking at the schema page, particularly the Digital Specimen json schema.

We've changed the structure by adding three main categories:

topicOrigin topicDomain topicDiscipline

In the JSON structure, we've added "enum" (enumeration) -- to use as predefined list of acceptable values. Until we have a proper vocabulary server, this approach helps us maintain consistency in how data is categorised.

I think the "other" category is still needed to capture the rest. We are calling this Other Biodiversity and Other Geodiversity.

ods:topicOrigin": {
      "type": "string",
      "description": "Highest-level terms identifying the fundamentals of the activities, in which context the objects in the collection were collected",
      "enum": [
        "Natural",
        "Human-made",
        "Mixed origin",
        "Unclassified"
      ],
      "examples": [
        "Natural"
      ]
    },
    "ods:topicDomain": {
      "type": "string",
      "description": "High-level terms providing general domain information with which the objects are associated",
      "enum": [
        "Life",
        "Environment",
        "Earth System",
        "Extraterrestrial",
        "Cultural Artefacts",
        "Archive Material",
        "Unclassified"
      ],
      "examples": [
        "Life"
      ]
    },
    "ods:topicDiscipline": {
      "type": "string",
      "description": "Overarching classification of the scientific discipline to which the objects within the collection belong or are related",
      "enum": [
        "Anthropology",
        "Botany",
        "Astrogeology",
        "Geology",
        "Microbiology",
        "Palaeontology",
        "Zoology",
        "Ecology",
        "Other Biodiversity",
        "Other Geodiversity",
        "Unclassified"
      ],
      "examples": [
        "Botany"
      ]
    },
ManonGros commented 4 weeks ago

Thanks @sharifX for letting me know, I wasn't aware of that.

Does it mean that the categories like Algae, Bryophytes, Fungi/Lichens (including Myxomycetes), Pteridophytes and Seed plants are no longer part of any controlled vocabulary?

If it the type of classification I have seen before in several institutions and I think it would be great to make them searchable (things like Algae collections cannot be searched easily otherwise). Will you use some other controlled value to work with these cases?

(I am trying to make these traditional collections more easily discoverable and I am not sure how best to proceed).

sharifX commented 3 weeks ago

@ManonGros yes, good point. We have it now inside our FDO profile (json) under topicCategory. We will update the list in the digital specimen json schema so the profile and the object data schemas are aligned.

ManonGros commented 3 weeks ago

Thanks @sharifX I think the topicCategory content is what I would like to integrate to the objectClassificationName, it would be quite helpful to have these to help normalise the (sub-)collection content. Does that make sense?

sharifX commented 2 weeks ago

@ManonGros Yes. that make sense.

Perhaps good to have this historical context documented as well. This is a sort of condensed historical context and background.

The development of objectClassificationName builds on work from the SYNTHESYS+ project, which initially explored collection classification schemes in natural sciences, proposing structured categories such as Discipline (e.g., Botany) and Category (e.g., Algae, Bryophytes). See https://doi.org/10.3897/rio.10.e118244.

Latimer Core formalised these concepts into objectClassificationName (http://rs.tdwg.org/ltc/terms/objectClassificationName). However, Latimer Core does not enforce a controlled vocabulary, allowing flexibility in naming.

The DiSSCo initiative retained the "Discipline" concept in its openDS schema (https://schemas.dissco.tech/), creating terms such as:

topicOrigin, topicDomain, topicDiscipline

To enhance consistency, DiSSCo plans to enforce values through a JSON enum (a controlled vocabulary constraint in the JSON schema) and leverage vocabularies from resources like GBIF and Catalogue of Life. DiSSCo will also establish a vocabulary server to manage terms specific to openDS use cases.