gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Double check all institutionCodes and collectionCodes on our collection datasets #28

Open rukayaj opened 3 years ago

dagendresen commented 3 years ago

Against GrSciColl ... and maybe possibly add the ROR ID (maybe possibly the Wikidata QID) as institutionID (?)

rukayaj commented 3 years ago

We have this spreadsheet which was generated using https://github.com/gbif-norway/data-prep-scripts/blob/master/scripts/get-all-collection-codes/script.py to use as a starting point. So far, we have checked all NHM datasets but have not added ROR IDs/QIDs for institutionID.

rukayaj commented 3 years ago

Discussion point in the weekly meeting: Perhaps we should be stricter about using our actual official institution's IDs in the institutionID codes - e.g. University of Oslo's ROR/GRID as the institutionID, not the Natural History Museum. Dag has done this for Tromsø, see https://github.com/gbif-norway/helpdesk/issues/21. Also related to https://github.com/gbif-norway/helpdesk/issues/5.

Further edit: We've consolidated all institutions so they are at the "official" institution level; i.e. they have or are eligible for ROR/GRID ids. See also https://github.com/gbif-norway/documentation/wiki/Data-publishing#data-publishers--organisations

rukayaj commented 2 years ago

Added UiO ROR https://ror.org/01xtthb56 to: https://ipt.gbif.no/manage/resource.do?r=birds https://ipt.gbif.no/manage/resource.do?r=o_mammals https://ipt.gbif.no/manage/resource.do?r=o_dna_fish_herptiles https://ipt.gbif.no/manage/resource.do?r=o_dna_plants https://ipt.gbif.no/manage/resource.do?r=o_dna_other https://ipt.gbif.no/manage/resource.do?r=o_dna_fungi_lichens https://ipt.gbif.no/manage/resource.do?r=o_dna_arthropods

https://ipt.gbif.no/manage/resource.do?r=o_vascular https://ipt.gbif.no/manage/resource.do?r=o_lichens https://ipt.gbif.no/manage/resource.do?r=o_fungi https://ipt.gbif.no/manage/resource.do?r=o_fish https://ipt.gbif.no/manage/resource.do?r=o_lepidoptera https://ipt.gbif.no/manage/resource.do?r=o_dip https://ipt.gbif.no/manage/resource.do?r=o_div https://ipt.gbif.no/manage/resource.do?r=o_bryophytes https://ipt.gbif.no/manage/resource.do?r=o_col https://ipt.gbif.no/manage/resource.do?r=algae_o

rukayaj commented 2 years ago

University of Bergen's ROR https://ror.org/03zga2b32 added (Katrine Kongshavn was asking about it for her dataset) https://www.gbif.org/grscicoll/institution/bdaa9eb1-e0d6-47e5-a2e0-c34d2d2ce631

rukayaj commented 2 years ago

By the way just so we know, it looks like the 'fuzzy institution match' flag can stick around for up to a week after adding a ROR id to GRSciColl (https://github.com/gbif/portal-feedback/issues/3869#issuecomment-1017204372).

kkongshavn commented 10 months ago

Updating to remind myself I am working on this (!); have asked all our collection managers to confirm/update information for their collections in a spreadsheet based on one I got from Dag. Then I will go in and update in GRSciColl where needed.

I did get a question from one of the CMs, regarding the bird skeleton collection that recently been added to GBIF (https://www.gbif.org/dataset/4f059695-36d9-4313-a27e-38e58ed0660c) : GRSCICOLL identifier, is that something that is generated once the collection is added to GRSciColl? @dagendresen ?

dagendresen commented 10 months ago

If I understand your question correctly - then yes, the "new" collection would be added to GRSciColl completely separately from adding the dataset from the collection on the IPT. These are not automatically linked operations in any way.

The declared identifiers specified in GRSciColl are unfortunately added manually - however, I have used the UUID generated automatically by GRSciColl to mint an urn:uuid:UUID identifier which I then declared manually. And then next adding this collection identifier to the dataset on the IPT under the collection identifier attribute in the EML.

(I would much have preferred for GRSciColl (or something) to mint a real DOI for the collection. We could also discuss minting an identifier in Wikidata (... but I do believe that a Wikidata QID is an even lesser proxy option than using the UUID from GRSciColl - because the Wikidata declares itself as NOT a primary end-point).)

See eg. https://www.gbif.org/grscicoll/collection/bd2941ba-de0a-46d5-85a2-b58ad93711bf https://registry.gbif.org/collection/bd2941ba-de0a-46d5-85a2-b58ad93711bf/identifier

kkongshavn commented 10 months ago

@dagendresen the bird skeletons are now registered at GrScicoll (https://registry.gbif.org/collection/0932d5d7-523c-450d-8013-12961a009b3b), but is it then urn:uuid:0932d5d7-523c-450d-8013-12961a009b3b and https://registry.gbif.org/collection/0932d5d7-523c-450d-8013-12961a009b3b I should add on the IPT under Collection Identifier and Parent Collection Identifier, respectively?

dagendresen commented 10 months ago

Yes (however not parent collection identifier, but collection identifier in the IPT EML)

The parent collection would be the vertebrate-, the mammal-, or the bird-collection, or...?

(the bird collection identifier = urn:uuid:91dab78d-2217-4257-8a3a-37b7a1952a24)

kkongshavn commented 9 months ago

Could someone (when you have time, I know many are travelling!) please check if I managed to register and link (from registry to ipt) these three correctly: Bird skeleton collection, Bergen Mammal skeletons in the modern Osteological Collections at the University Museum of Bergen Amphibian and reptile skeletons in the modern Osteological Collections at the University Museum of Bergen

The latter two were added just now, so might need time to update before anything show?

rukayaj commented 8 months ago

At the moment we have two entries in grscicoll, one for University of Bergen https://registry.gbif.org/institution/955b0e63-c3b5-4d74-8dfa-15b384b9ae77/identifier and one for the University Museum of Bergen: https://registry.gbif.org/institution/bdaa9eb1-e0d6-47e5-a2e0-c34d2d2ce631/identifier

They have the same ROR ID (https://ror.org/03zga2b32) so should probably be merged? We need to then put the ROR ID (https://ror.org/03zga2b32) as the institutionID for all datasets related to Bergen.

Bird skeleton collection, Bergen = Bird skeletons in the modern Osteological Collections at the University Museum o… = https://ipt.gbif.no/manage/resource?r=osteologi_birds ? It's not linked properly, we need the institutionID and it looks like we also need to create a collection in grscicoll for it? I don't see it here https://registry.gbif.org/collection/search?city=Bergen

Mammal skeletons in the modern Osteological Collections at the University Museum of Bergen = https://www.gbif.org/dataset/3d3e91d6-b49e-49d5-84d2-32df553ee633 = https://ipt.gbif.no/resource?r=osteologi_pattedyr Also not linked, and I'm not seeing a collection for it.

Amphibian and reptile skeletons in the modern Osteological Collections at the University Museum of Bergen = https://www.gbif.org/dataset/09091176-bbe2-4077-9f4b-cfd555564160 = https://ipt.gbif.no/resource?r=osteologi_krypdyr Ditto, don't think it has a collection.

rukayaj commented 8 months ago

I just added institutionID (https://ror.org/03zga2b32) for all three of those datasets and published them again. Can you suggest additional collections to grscicoll using the Create new button here https://registry.gbif.org/collection/search?city=Bergen? Otherwise I will do that too but I'll have to put in some skeleton information.

rukayaj commented 8 months ago

Whoops I didn't read up in this thread. I see you do have registered the birds as a collection in grscicoll, and also that @dagendresen was suggesting to add this in the EML. However, as far as I'm aware it has to get added at the record level in institutionID and collectionID, like so: https://www.gbif.org/occurrence/4080386649 (I added your collectionID in there as well). And then it links up.

It is annoying and confusing to have the ability to add it in two different places!

kkongshavn commented 8 months ago

Hi @rukayaj! you type faster than I can reply 😅 All three of these collections have been added (by me) to grscioll, you should be able to see them under University of Bergen - which is the correct Institution (and I think the museum one is inactive?) I only attempted to link the bird skeletons in the ipt, because Igot confused about where it should link.

ZMUB-aves-skel | Bird skeleton collection, Bergen | https://registry.gbif.org/collection/0932d5d7-523c-450d-8013-12961a009b3b ZMUB-mammalia-skel | Mammal skeletons in the modern Osteological Collections at the University Museum of Bergen | https://registry.gbif.org/collection/91fff040-f900-4f90-b75f-484004722d91 ZMUB-reptilia-skel | Amphibian and reptile skeletons in the modern Osteological Collections at the University Museum of Bergen | https://registry.gbif.org/collection/528159de-155f-4fb7-bce5-c92cb1c041d0

edit: and I can edit in GrSciColl, Dag gave/fixed access for me earlier

rukayaj commented 8 months ago

Aha great! Ok I've added those other 2 collectionIDs to the datasets, so we should see all the records link correctly soon.

The way I tell if the link has been made correctly is by looking at a record in the dataset, e.g. https://www.gbif.org/occurrence/4080386649, which has institutionID linked up correctly, and collectionID as inferred (which is fine because it's correct, but I also explicitly added the link and we should see it show up soon).

kkongshavn commented 8 months ago

Aha great! Ok I've added those other 2 collectionIDs to the datasets, so we should see all the records link correctly soon.

The way I tell if the link has been made correctly is by looking at a record in the dataset, e.g. https://www.gbif.org/occurrence/4080386649, which has institutionID linked up correctly, and collectionID as inferred (which is fine because it's correct, but I also explicitly added the link and we should see it show up soon).

Hi @rukayaj , could you please do this for the final skeleton* collection as well? I just added it to GrSciColl: ZMUB-fish-skel | Fish skeletons in the modern Osteological Collections at the University Museum of Bergen | https://registry.gbif.org/collection/79a2129f-546d-4732-b132-a2e6ac31cdc1 | https://www.gbif.org/dataset/b7a6efe5-7bd8-4d51-9772-2b955fd61f95

(*just in time for Halloween! 💀)

rukayaj commented 8 months ago

Hmm I see the institutionID and collectionID mapping overrides I put in for all the other datasets has gone.

@vidarbakken I think you maybe updated these datasets, based on the IPT log? This is what institutionID, institutionCode and collectionID should look like, for these datasets: https://ipt.gbif.no/manage/mapping.do?r=osteologi_krypdyr&id=http%3A%2F%2Frs.tdwg.org%2Fdwc%2Fterms%2FOccurrence&mid=0

Can you update the other datasets? I'm not sure if you prefer to have it in the source file or just in the mapping override, so I will leave it up to you....

rukayaj commented 8 months ago

By the way these are not MUSIT datasets right? Because if they are we should probably be publishing them from the database...

kkongshavn commented 8 months ago

By the way these are not MUSIT datasets right? Because if they are we should probably be publishing them from the database...

Hi, no these are not from MUSIT - no vertebrates in musit, from what I understand. See here for more info about these four: https://github.com/gbif-norway/helpdesk/issues/142

vidarbakken commented 8 months ago

I didn't know that these codes had been updated in the datasets. I will update the datasets with the new codes. I will also implement the changes in the export routine from excel used by Hanneke. Otherwise it will not be correct in the next table update.