gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
5 stars 1 forks source link

Preparations vocabulary #97

Open timrobertson100 opened 3 years ago

timrobertson100 commented 3 years ago

We should consider if it makes sense to offered standardized search for preparations.

To get a sense of the content, a quick export shows 2628 verbatim values are used in >1000 records, or >3 datasets, created using the following

select v_preparations as preparation, count(*) as c, count(distinct datasetKey) as d 
from prod_h.occurrence where v_preparations is not null 
group by v_preparations having count(*)>1000 or count(distinct datasetKey)>3; 

The result of this is ~preparations.csv (replaced by later version)~ preparations-20220602.csv

1000 and 3 were chosen as arbitrary values to get a quick sense of the content only and more fine-grained exploration is certainly needed

tucotuco commented 3 years ago

Several observations. 1) This is not a term that has a recommendation for a controlled vocabulary in Darwin Core 2) Content of this term has a mix of at least three distinct concept (parts, preservation methods, and preparation methods). 3) Content in this term supports lists of the above combinations 4) Despite the above observations, the participants in the NAOC vocabulary workshop in February of this year began a process of concept definitions and mappings of GBIF preparations values to body parts and preservation methods. The task is still incomplete, but an immense amount of work has gone into it by more than 40 people and should be considered for assessment.

timrobertson100 commented 3 years ago

Thanks, @tucotuco, this obviously needs more detailed investigation and thought. If there are outputs from the NAOC work, can we please add them in links as comments?

The motivation for recording this here was to mirror some of the ideas within the ALA basisOfRecord thread, with the associated work to bring in a vocabulary. All of this reinforces the need for the proposed DwC task group

timrobertson100 commented 2 years ago

@tucotuco

The new member of the data team at GBIF, @CecSve is doing a push to get all our vocabularies that have stalled into the server, so we have a good basis on which to evolve and improve on much of the old "dictionary CSV" files we still use.

Can you please update us on any outputs from the NAOC vocabulary workshop you described above?

CecSve commented 1 year ago
  • Content of this term has a mix of at least three distinct concept (parts, preservation methods, and preparation methods).

  • Content in this term supports lists of the above combinations

We currently have the following enum and vocabulary for preservations : https://api.gbif.org/v1/enumeration/basic/PreservationMethodType / http://rs.gbif.org/vocabulary/gbif/preservation_method.xml
https://api.gbif.org/v1/enumeration/basic/PreservationType

@tucotuco it would be interesting to see how all the participants mapped the preparations. Is that accessible somewhere?

We do not have any defined preparations concepts for all these values, but perhaps the old vocabulary and enum for preservations can be of use - although the SAMPLE* and STORAGE* use in the enum reads as mixing vocabularies together...

tucotuco commented 1 year ago

I think the conclusion of the NAOC group was that preparations alone is not really satisfactory. The workshop produced a LOT of work nevertheless after dividing the problem into a body part vocabulary referring to the UBERON ontology and a vocabulary for preservation method. All of the work to date can be found in the NAOC Preparations Mapping spreadsheet.

CecSve commented 1 year ago

Investigate GGBN preparations and preservations vocabularies for this vocabulary as well.

https://rs.gbif.org/extension/ggbn/preparation.xml https://rs.gbif.org/extension/ggbn/preservation.xml

CecSve commented 1 year ago

As values in this field relates to multiple potential vocabularies, I have tried to make a list of relevant issues, work and proposed definitions/vocabularies etc. in the TDWG community:

CecSve commented 6 days ago

@ManonGros the two listed above may be relevant for the GRSciColl vocabulary: https://github.com/tdwg/cd/issues/321 https://github.com/tdwg/cd/issues/65