Closed ManonGros closed 8 months ago
I would like to continue to work on this one. I have the data ready for the labels in the "Concepts" sheet, I also have one correction in the "Hidden" sheet. In addition, I have the labels and mappings for every concept and hidden label in the GBIF snapshot from 2021-01-12, which added 232 distinct values of geodeticDatum since 2020-04-09. I would like to include them all rather than the arbitrary cut-off of popularity. The reason is that all of them are needed to do proper coordinate transformations to WGS84 in the pipeline. It seems silly to leave them out if I already have them. I have requested edit access to the sheet.
I just put the English labels for the Concepts. These all come from EPSG.ioand changed the one hidden mapping of "ETRS89 (~WGS84)" to 4258. I see the thumbs up from @ManonGros on my previous comment, but will await an explicit confirmation before adding the other Concepts and hidden labels.
Thanks @tucotuco! you can go ahead with the rest.
I have finished updating the "Concepts" and "Hidden" sheets. I added a few more concepts and a lot more hidden alternatives. I removed the Hidden labels that mapped to concepts that had the same concept id and english label as there shouldn't be any duplication between those sets. I refrained from adding all of the unmappable nonsense (coordinates, integers out of range for epsg codes, etc.) that would have tripled the size of the table.
Check the already mapped values in the Hidden sheet/tab:
I will begin to finalize this vocabulary.
This vocabulary is only used for internal processing and inference of verbatim values to EPSG: standards - the UI/interpreted field will always show WGS 84 (as it does now), so it is a support for the re-projecting step.
Hi @CecSve . I have a complete mapping of 82707 distinct v_geodeticdatum to epsg codes as of the 2022-07-14 snapshot. The mappings can be found here. Let me know if that is sufficient, and if not, I can fill out the sheets the way vocabularies have normally been done.
Perfect, thanks @tucotuco. I think that would be sufficient. Should I just go ahead and add them to the hidden values and remove any duplicates then? I have added some unmapped verbatim values from the verbatim values sheet to the hidden values sheet - we might want to check if the are covered in the snapshots (I would think they would be?).
The snapshot contains all distinct values that have arisen in snapshots taken since January 2021, plus verbatim values encountered during VertNet data provider migrations (preparation for publishing). Thus, it should be comprehensive and more than is found in GBIF as of that latest snapshot date. I would replace everything in the spreadsheet entirely with what is in that mapping file I shared.
Ok, sounds good. I will replace it and prepare it for upload.
From our portal feedback system.
Where does the copy of this vocabulary used by the pipeline live? There is interest by the TDWG Biodiversity Data Quality Interest Group to actually be able to access the vocabulary for lookups to EPSG codes.
This vocabulary hasn't been created in live yet and therefore it's still not used.
@marcos-lg But is there a publicly accessible copy of it that people can refer to? It is immensely useful.
Only the google sheet I believe https://drive.google.com/file/d/1dyMU5H1HHNHpgVaa8kmPkGi5esIae7_4/view?usp=sharing
PNG94 is part of the vocabulary in UAT: https://registry.gbif-uat.org/vocabulary/GeodeticDatum. Should we move this vocabulary to prod?
@tucotuco I think I need your help. I am trying to get the vocabulary uploaded and have some issues with the concepts and labels - the labels have to be unique but some appear in two concepts:
Can it somehow be specified how they differ, if it is not a mistake that they appear twice?
@CecSve epsg:4291 is a deprecated version of 4618. Both are for SAD69 in Brazil.
epsg:4938 is a valid coordinate reference system for decimal degrees, epsg:4283 is not. All mappings to 4283 should actually be to 4938 instead.
NAD27 should be epsg:4267 as epsg:4367 is a deprecated coordinate reference system for REGVEN (3D).
I hope that helps.
Thank you @tucotuco.
For the latter two I will;
For the first, I propose to change the Label_en to SAD69 - Brazil
for 4618 so the labels are unique. Would that work for you?
Yes, that all will work..
The vocabulary has now been uploaded to PROD: https://registry.gbif.org/vocabulary/GeodeticDatum.
Here is a file to edit: https://drive.google.com/file/d/1dyMU5H1HHNHpgVaa8kmPkGi5esIae7_4/view?usp=sharing
It contains:
NB: this particular version of the mapping is recent as I incorporated the work of @tucotuco. So I don't think there is a lot of work to be done.
Pease check instructions here: https://github.com/gbif/vocabulary/issues/70