gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
6 stars 1 forks source link

Geodetic Datum - curation before uploading first vocabulary version #71

Closed ManonGros closed 8 months ago

ManonGros commented 3 years ago

Here is a file to edit: https://drive.google.com/file/d/1dyMU5H1HHNHpgVaa8kmPkGi5esIae7_4/view?usp=sharing

It contains:

NB: this particular version of the mapping is recent as I incorporated the work of @tucotuco. So I don't think there is a lot of work to be done.

Pease check instructions here: https://github.com/gbif/vocabulary/issues/70

tucotuco commented 3 years ago

I would like to continue to work on this one. I have the data ready for the labels in the "Concepts" sheet, I also have one correction in the "Hidden" sheet. In addition, I have the labels and mappings for every concept and hidden label in the GBIF snapshot from 2021-01-12, which added 232 distinct values of geodeticDatum since 2020-04-09. I would like to include them all rather than the arbitrary cut-off of popularity. The reason is that all of them are needed to do proper coordinate transformations to WGS84 in the pipeline. It seems silly to leave them out if I already have them. I have requested edit access to the sheet.

tucotuco commented 3 years ago

I just put the English labels for the Concepts. These all come from EPSG.ioand changed the one hidden mapping of "ETRS89 (~WGS84)" to 4258. I see the thumbs up from @ManonGros on my previous comment, but will await an explicit confirmation before adding the other Concepts and hidden labels.

ManonGros commented 3 years ago

Thanks @tucotuco! you can go ahead with the rest.

tucotuco commented 3 years ago

I have finished updating the "Concepts" and "Hidden" sheets. I added a few more concepts and a lot more hidden alternatives. I removed the Hidden labels that mapped to concepts that had the same concept id and english label as there shouldn't be any duplication between those sets. I refrained from adding all of the unmappable nonsense (coordinates, integers out of range for epsg codes, etc.) that would have tripled the size of the table.

tucotuco commented 3 years ago
    • [x] Add English labels to the concepts.
  1. Check the already mapped values in the Hidden sheet/tab:

    • [x] Correct any errors.
    • [ ] When needed, move the mapped values from the Hidden sheet to the Concept sheet as an alternative label but do not add any new concept.
    • [x] Map as many verbatim values as possible and add them to the Concepts and Hidden sheet/tabs:
    • [ ] Incorporate the ALA mapping to the Concepts and Hidden sheet/tabs if possible.
CecSve commented 2 years ago

I will begin to finalize this vocabulary.

This vocabulary is only used for internal processing and inference of verbatim values to EPSG: standards - the UI/interpreted field will always show WGS 84 (as it does now), so it is a support for the re-projecting step.

tucotuco commented 2 years ago

Hi @CecSve . I have a complete mapping of 82707 distinct v_geodeticdatum to epsg codes as of the 2022-07-14 snapshot. The mappings can be found here. Let me know if that is sufficient, and if not, I can fill out the sheets the way vocabularies have normally been done.

CecSve commented 2 years ago

Perfect, thanks @tucotuco. I think that would be sufficient. Should I just go ahead and add them to the hidden values and remove any duplicates then? I have added some unmapped verbatim values from the verbatim values sheet to the hidden values sheet - we might want to check if the are covered in the snapshots (I would think they would be?).

tucotuco commented 2 years ago

The snapshot contains all distinct values that have arisen in snapshots taken since January 2021, plus verbatim values encountered during VertNet data provider migrations (preparation for publishing). Thus, it should be comprehensive and more than is found in GBIF as of that latest snapshot date. I would replace everything in the spreadsheet entirely with what is in that mapping file I shared.

CecSve commented 2 years ago

Ok, sounds good. I will replace it and prepare it for upload.

CecSve commented 1 year ago

From our portal feedback system.

tucotuco commented 1 year ago

Where does the copy of this vocabulary used by the pipeline live? There is interest by the TDWG Biodiversity Data Quality Interest Group to actually be able to access the vocabulary for lookups to EPSG codes.

marcos-lg commented 1 year ago

This vocabulary hasn't been created in live yet and therefore it's still not used.

tucotuco commented 1 year ago

@marcos-lg But is there a publicly accessible copy of it that people can refer to? It is immensely useful.

marcos-lg commented 1 year ago

Only the google sheet I believe https://drive.google.com/file/d/1dyMU5H1HHNHpgVaa8kmPkGi5esIae7_4/view?usp=sharing

CecSve commented 8 months ago

PNG94 is part of the vocabulary in UAT: https://registry.gbif-uat.org/vocabulary/GeodeticDatum. Should we move this vocabulary to prod?

CecSve commented 8 months ago

@tucotuco I think I need your help. I am trying to get the vocabulary uploaded and have some issues with the concepts and labels - the labels have to be unique but some appear in two concepts:

Can it somehow be specified how they differ, if it is not a mistake that they appear twice?

tucotuco commented 8 months ago

@CecSve epsg:4291 is a deprecated version of 4618. Both are for SAD69 in Brazil.

epsg:4938 is a valid coordinate reference system for decimal degrees, epsg:4283 is not. All mappings to 4283 should actually be to 4938 instead.

NAD27 should be epsg:4267 as epsg:4367 is a deprecated coordinate reference system for REGVEN (3D).

I hope that helps.

CecSve commented 8 months ago

Thank you @tucotuco.

For the latter two I will;

For the first, I propose to change the Label_en to SAD69 - Brazil for 4618 so the labels are unique. Would that work for you?

tucotuco commented 8 months ago

Yes, that all will work..

CecSve commented 8 months ago

The vocabulary has now been uploaded to PROD: https://registry.gbif.org/vocabulary/GeodeticDatum.