Open dhobern opened 1 year ago
Dear @bart-v (@gdower) could you please have look on this?
The ColDP distributions should consist of an identifier areaID
and a controlled gazetteer
value to know the context and optionally a human label area
. See discussion at https://github.com/CatalogueOfLife/coldp/issues/40. The ColDP docs don't seem to line up, I'll update them.
For example in the case above one record with:
gazetteer=mrgid
areaID=48142
The currently provided values from WoRMS are not that far off, they just use URLs for areaID which appears be the problem here. It is interpreted as a concatenation of multiple values:
https://www.checklistbank.org/dataset/1130/verbatim/308476
col:area = Ireland
col:areaID = http://marineregions.org/mrgid/48213
col:taxonID = urn:lsid:marinespecies.org:taxname:875546
col:gazetteer = mrgid
@bart-v the distribution looks fine to me, I'll make sure we interpret the URL value as a single value. Showing the area name is a bit more difficult though - we will need to track the entire MRGID enumeration in the backend like we do for TDWG, ISO and other codes.
@bart-v looking at MRGID it appears it actually refers to other standards like TDWG in this case: https://marineregions.org/gazetteer.php?p=details&id=48213
Relations:
Part of Ireland (TDWG - level 3)
Has preferred alternative Ireland (Nation) [view hierarchy]
Is MRGID not a standard on its own but rather a managed collection of other standards as linked data for placenames? I am confused that there is a preferred alternative given.
Reopening as the issue is adressed in the backend code, but not in the data. Potentially all WoRMS sources should be reimported and resynced now.
This issue has been there for quite a while, but I always assumed what we send is just OK. Great to see that confirmed & thanks for looking into this now.
MarineRegions (MRGID) is indeed a mixture of multiple existing standards, plus a multitude of entries that are not part of any other standard at all: obviously and especially marine place names.
The "preferred alternative" is just there to indicate the preferred MRGID within the standard, not a link to an external standard.
Thus, I think COL should consider this as a separate standard, especially since we assign proper PIDs to it.
Thanks @bart-v, could you explain a little more how MarineRegions works? If I understand correctly it is assembled from all these sources here: https://marineregions.org/sources.php
When/how often does it change? Are their distinct releases with versions? I can only seem to be able to download individual sources, but not the entire MarineRegions. Is it available somewhere e.g. to lookup a region name from an MRGID?
MarineRegions is updated constantly, just like WoRMS. What is mentioned under the sources page is just a subset of the entries, i.e. the bulk, but not all of them.
For machines, we have multiple ways to access the data, i.e. a Linked Data Event Streams (LDES) feed https://www.marineregions.org/gazetteer.php?p=webservices
Great. To seed a system before using LDES you would use REST or does LDES provide you with simple ways to access a "snapshot"? I havent used LDES before, looks useful.
LDES will sync anything that is new for the client. If there is nothing, it will sync everything. So no REST needed.
See the distributions here: https://www.catalogueoflife.org/data/taxon/5W2WT
mrgid:marineregions.org , mrgid:mrgid , mrgid:48182 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48140 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48142 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48151 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48214 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48213 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48115 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48122 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48457 , mrgid:marineregions.org , mrgid:mrgid , mrgid:48366
This seems to be a set of triples with subjects, predicates and objects all concatenated as a single comma-separated string - we should work with WoRMS to get these flowing through in a better format and probably just as the objects (country names or ISO codes should be plausible in this case, but at very least URIs like http://marineregions.org/mrgid/48182, etc.).