gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
6 stars 1 forks source link

dwc:GeologicalContext: Chronostratigraphy vocabulary - curation before uploading first vocabulary version #121

Open CecSve opened 1 year ago

CecSve commented 1 year ago
          dwc:GeologicalContext terms: https://github.com/gbif/pipelines/issues/400

Originally posted by @CecSve in https://github.com/gbif/vocabulary/issues/120#issuecomment-1404677164

A Chronostratigraphy vocabulary would cover concepts across multiple terms in the dwc:GeologicalContext category (https://github.com/gbif/pipelines/issues/400#issuecomment-1404326899):

earliestEonOrLowestEonothem
latestEonOrHighestEonothem
earliestEraOrLowestErathem
latestEraOrHighestErathem
earliestPeriodOrLowestSystem
latestPeriodOrHighestSystem
earliestEpochOrLowestSeries
latestEpochOrHighestSeries
earliestAgeOrLowestStage
latestAgeOrHighestStage

The vocabulary follows the vocabulary published by the CGI Geoscience Terminology Working Group hosted by the International Commission on Stratigraphy (ICS) (https://vocabs.ardc.edu.au/viewById/196, https://github.com/gbif/pipelines/issues/400#issuecomment-1404797379, https://github.com/CSIRO-enviro-informatics/interactive-geological-timescale/blob/master/src/assets/timeline_data.json, https://stratigraphy.org/timescale/).

Here is a file to edit: https://docs.google.com/spreadsheets/d/1k3YpAeRT3HxR9DBnkh0jkZZl12jimkHU3_H_pCPOUHc/edit?usp=sharing

https://docs.google.com/spreadsheets/d/1aHqhhtO93nooQ0o4AAVcSBVpyb-IGUXu9dZVTiN77TY/edit#gid=694447980 (updated version that supports numerical ranges for the time scales - version to be implemented)

It contains:

Please check instructions here: https://github.com/gbif/vocabulary/issues/70

CecSve commented 1 year ago
CecSve commented 1 year ago

I will setup all the verbatim field tabs tomorrow and let you know when it is ready.

CecSve commented 1 year ago

I have now set up 10 tabs for each field related to chronostratigraphy. Duplicate/identical values have been removed, although the same value may appear in e.g. "", () or similar - please map these to concepts as well although they appear to be duplicates.

If a value does not belong to any of the concepts, please leave it unmapped.

You may also want to take a look at the suggested definitions tab where you can fill out definitions and descriptions for the concepts (including time period) according to authoritative sources.

@ekrimmel - I have heard you also have a Slack channel assigned for this work. Feel free to add me if you find it useful for me to be part of it.

CecSve commented 6 months ago

Following meetings with the Paleo Working Group in CPH this week, we have decided that we want one search term for stratigraphy (all 10 dwc fields), 1 search term for lithistratigraphy (combining 4 dwc fields) and 1 field for biostratigraphy (combining two dwc fields).

So we will reduce 16 dwc fields to 3 in searches - see this issue: https://github.com/gbif/gbif-web/issues/497.

Now, how should I set up the vocabular(y/ies) on the vocabulary server for this?

  1. Concept
  2. Rank
  3. Range and then the hidden value mapping would be somewhere else? As far as I understand, we would use the rank and ranges to assign the correct concept during interpretation, since the dwc fields are rank specific.
RogerBurkhalter commented 6 months ago

Again, I strongly support this. Question, does "Range" refer to text or numeric values? Numeric values are more precise, but a moving target. If using IUGS values, use only the ratified values and not numbers (or text) harvested from issues of "Episodes" where values are not finalized. I've seen some wild ones recently.

CecSve commented 6 months ago

Again, I strongly support this. Question, does "Range" refer to text or numeric values? Numeric values are more precise, but a moving target. If using IUGS values, use only the ratified values and not numbers (or text) harvested from issues of "Episodes" where values are not finalized. I've seen some wild ones recently.

The plan is to use the numerical age from the most recent ICS source: https://stratigraphy.org/ICSchart/ChronostratChart2023-09.pdf. I do not see any mention of IUGS values, but I do see this specification:

Numerical ages are subject to revision and do not define units in the Phanerozoic and the Ediacaran; only GSSPs do. For boundaries in the Phanerozoic without ratified GSSPs or without constrained numerical ages, an approximate numerical age (~) is provided.

Would you then advice GBIF not to use the uncertain ages (~)? @ekrimmel and others, we did not discuss this, but you may want to chime in?

Just to be clear - the numerical ages would be used to structure data in the back end to enable more dynamic searches on paleo data. What users would see and search for would most likely be the concepts themselves.

CecSve commented 5 months ago

The vocabulary concepts are now uploaded to UAT and PROD.

@MortenHofft this was what you needed for the hosted portal, right?

Now we just need to add the hidden value mappings when they are ready.

ekrimmel commented 5 months ago

We are working on this again! Sorry for the long delays between action :)

CecSve commented 5 months ago

No worries - thank you for dealing with the mappings and let me know if you have any questions for the rest of them.

CecSve commented 4 months ago

The tags were missing from the previous upload so the vocabulary has now been uploaded again to UAT and PROD so the age period of the concept is showing in the tags (uncertain age periods are not included).

CecSve commented 2 months ago

We now have the potential flags and issues included. They still require proper documentation.