gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Institution identifiers (Wikidata, GrSciColl) #5

Closed dagendresen closed 1 year ago

dagendresen commented 3 years ago

Tasks

Google Spreadsheet with a list of the Norwegian institutions

How to categorize as GBIF data publishers?

dagendresen commented 3 years ago

Steve Baskauf published (several) GREAT blog post(a) on getting (meta)data in and out of Wikidata! :-)

rukayaj commented 3 years ago

So we plan to add a Norwegian Data Publishers entity to Wikidata, and then link each of these data publishers https://docs.google.com/spreadsheets/d/1KuAn3N87gMnxz168DSGxbymme6FmV9xOTk952UdqSYI/edit#gid=1869195682 to it, and also to link it the other way from the main page to the individual publishers.

dagendresen commented 3 years ago

I have started a Wikidata catalog of Norwegian data publishers here https://www.wikidata.org/wiki/Q106706430

dagendresen commented 3 years ago

Should we transform the registration for the university museums to a registration representing their universities? -- I vote yes!!

NTNU Science Museum --> Norwegian University of Science and Technology (NTNU)

UiA KMN --> University of Agder (UiA)

UiB University Museum --> University of Bergen (UiB) UiB Department of Biology --> UiB (merge)

UiO Natural History Museum --> University of Oslo (UiO) UiO Department of biosciences --> UiO (merge) UiO Department of Geosciences --> UiO (merge)

UiS Museum of Archeology --> University of Stavanger (UiS)

UiT College of Fishery --> UiT (which is represented by the UiT Museum) (merge)

rukayaj commented 3 years ago

Yes, that makes sense! I don't think there is any merge functionality in the registry, so we probably have to re-assign all the datasets with UiO Department of biosciences and UiO Department of Geosciences to the new University of Oslo (UiO). And then delete UiO Department of biosciences and UiO Department of Geosciences.

dagendresen commented 3 years ago

Agree (that was what I meant) - do you want to try starting to move the UiO Bio and UiO Geo datasets? I can rename the registry entry for UiO NHM.

rukayaj commented 3 years ago

Sure, I'm going to walk down to the office in a few minutes for my Medarbeidersamtale, I will do it when I get back.

dagendresen commented 3 years ago

No immediate hurry!! But cool to get this clean-up started. Best of luck with the staff conversation :-)

I have updated all institutions in the registry. We can proceed to move the datasets -- and finally, deprecate the then obsolete sub-units.

rukayaj commented 3 years ago

Ok great, I've merged them.

The ones changed for UiO Department of Geosciences (e0d2ac6c-6f05-474d-bd50-1c5ab6614498): https://www.gbif.org/dataset/66c9903c-5bb2-4b10-a3ac-33c505623f66 https://www.gbif.org/dataset/b9f90d91-53c5-4c0f-b950-5678a7ecd571 https://www.gbif.org/dataset/2372eba1-a784-48a3-980e-ae7798f12c7d https://www.gbif.org/dataset/101f5645-c5c0-4981-a1a2-3d2bb1853edf https://www.gbif.org/dataset/5a8d4f46-ff7c-442b-a8da-1e312cbaafb5 https://www.gbif.org/dataset/27707a6d-3320-4e85-be33-aafaf7f321f3 https://www.gbif.org/dataset/1033dac1-8ed3-467d-a097-52d0a4b7dc26 https://www.gbif.org/dataset/4f659137-54b4-4243-8bcb-5b2f534481c8

The ones changed for UiO Department of biosciences (6c1dec01-8c82-4341-8584-26c6576d3b69): https://www.gbif.org/dataset/88d6237c-fbce-4135-b509-0aeed1e4222e

The ones changed for UiB University Museum (f92082e0-fb4f-4881-b3af-b3048610909a): https://www.gbif.org/dataset/d750e2d1-89cf-4d8c-a434-26441e5e7629 https://www.gbif.org/dataset/37dddcd2-ed51-4ada-86df-37449162f404 https://www.gbif.org/dataset/fe7fa086-1b67-4c90-abe1-123048ead530 https://www.gbif.org/dataset/ad739ed0-ab30-4ca1-83ae-7c6ce228cafb

I also changed them on the IPT (i.e. manually editing resource.xml and restarting the server as Marie suggested last time we did this).

dagendresen commented 3 years ago

Maybe we want to check with the Secretariate if it is possible to transfer the data citations to the combined UiO and UiB entries... (if they "hang on" the dataset, maybe they will already move automatically in the upcoming indexing cycles...)

UiO -- https://www.gbif.org/publisher/f314b0b0-e3dc-11d9-8d81-b8a03c50a862

UiO Bio -- https://www.gbif.org/publisher/6c1dec01-8c82-4341-8584-26c6576d3b69 UiO Geo -- https://www.gbif.org/publisher/e0d2ac6c-6f05-474d-bd50-1c5ab6614498

rukayaj commented 3 years ago

I sent an email yesterday and didn't get a reply, but it looks like the citations have moved over automatically now anyway.

dagendresen commented 3 years ago

I seem to recall that the "UiO NHMO" publisher now "UiO" had 67 citations also before moving the UiO Bio and UiO Geo datasets... thus the citations have not yet moved but only been deleted from the now obsolete UiO Bio and UiO Geo publisher entries...?

rukayaj commented 3 years ago

Hasn't it got 643 citations now? I thought yesterday it had 630ish, but I might be wrong. I'll send a follow up email.

dagendresen commented 3 years ago

Sorry, you are right -- looked at datasets! But I seem to recall the number of datasets was 67, eg. briefly looking I do not find the Foraminiferida dataset by Elisabeth Alva at UiO Geo...

dagendresen commented 3 years ago

OK!, here is (one of them?) https://doi.org/10.15468/fnq9yx

rukayaj commented 3 years ago

Here's a list of them: The ones changed for UiO Department of Geosciences (e0d2ac6c-6f05-474d-bd50-1c5ab6614498): https://www.gbif.org/dataset/66c9903c-5bb2-4b10-a3ac-33c505623f66 https://www.gbif.org/dataset/b9f90d91-53c5-4c0f-b950-5678a7ecd571 https://www.gbif.org/dataset/2372eba1-a784-48a3-980e-ae7798f12c7d https://www.gbif.org/dataset/101f5645-c5c0-4981-a1a2-3d2bb1853edf https://www.gbif.org/dataset/5a8d4f46-ff7c-442b-a8da-1e312cbaafb5 https://www.gbif.org/dataset/27707a6d-3320-4e85-be33-aafaf7f321f3 https://www.gbif.org/dataset/1033dac1-8ed3-467d-a097-52d0a4b7dc26 https://www.gbif.org/dataset/4f659137-54b4-4243-8bcb-5b2f534481c8

The ones changed for UiO Department of biosciences (6c1dec01-8c82-4341-8584-26c6576d3b69): https://www.gbif.org/dataset/88d6237c-fbce-4135-b509-0aeed1e4222e

The ones changed for UiB University Museum (f92082e0-fb4f-4881-b3af-b3048610909a): https://www.gbif.org/dataset/d750e2d1-89cf-4d8c-a434-26441e5e7629 https://www.gbif.org/dataset/37dddcd2-ed51-4ada-86df-37449162f404 https://www.gbif.org/dataset/fe7fa086-1b67-4c90-abe1-123048ead530 https://www.gbif.org/dataset/ad739ed0-ab30-4ca1-83ae-7c6ce228cafb

I checked 3 of them in the UiO publisher datasets list and they seem to be there, so I think it's ok. I am worried that the citations might not have moved though.

rukayaj commented 3 years ago

Tim replied: Citations are tracked on a dataset level, so they do indeed move automatically when the dataset changes publisher. The publisher pages just show the aggregate count of dataset citations. It can take a little time for all the various systems to reach consistency, which is why there can be some minutes before the citations swapped over on the website.

Please be aware that this is not the case if a dataset is split or replaced though. Depending on what is changing, it may be possible to manually move citations but that can be a tedious task (they may even need verified on a case by case basis) so not something we can automate.

dagendresen commented 3 years ago

Apropos - Looking for the datasets the GBIF portal dataset listing is rather useless (no alphabetical list or no list view at all) while your dataset listing at GBIF.no is much more useful :-)

(Similar to the feature request for dataset listing for Living Norway in the Hosted Portal)

dagendresen commented 3 years ago

Looks like the dataset move worked spotlessly. I think this becomes much cleaner this way!

dagendresen commented 3 years ago

A next cool thing might be (later) if we manage to link some metadata to be curated in Wikidata/ROR/Bionomia(?)/etc to be imported into the EML for datasets on the IPT... :-) To dereference identifiers and enrich "missing" metadata.

dagendresen commented 3 years ago

Should we do the dataset for the UiT Fishery college also?

Current publisher: https://www.gbif.org/publisher/8cf21d03-4799-4fd5-99a4-b9836de4ec37 Dataset https://www.gbif.org/dataset/4894f2f8-74b5-403e-bd8d-6fe5123a3f71

Move to new publisher UiT (?): https://www.gbif.org/publisher/689b40c4-ff31-4cd0-83a5-a7a828f1cd92

See the home page for UiT fishery college. https://uit.no/enhet/nfh

It looks quite clear-cut merged with UiT complete. So I think we can simply complete the merge on the GBIF registry as well. (We might maybe confirm with Heini and Andreas..., but I think it would be no issue to just do it).

rukayaj commented 3 years ago

Ok I have changed it, I guess it might take a little while to show up.

dagendresen commented 3 years ago

I notice that this issue has shifted topic from the Wikidata QID for institutions -- to this clean-up of (duplicate) publisher registrations. Maybe create a new issue for the Wikidata and ROR identifiers, and close this one -- after we (1) verify that the merge in the GBIF registry remains without issues; and (2) delete or flag the now-obsolete data publisher entries from the GBIF registry in contact with the secretariat.

I have added comments such as this one on the now-obsolete GBIF publishers in the Registry: https://registry.gbif.org/organization/8cf21d03-4799-4fd5-99a4-b9836de4ec37/comment

Maybe document somewhere the practice of aiming for Norwegian GBIF data publishers to be institutions/entities that are eligible for or have ROR and/or Grid institution identifiers.

See also https://github.com/gbif/registry/issues/285 and https://github.com/gbif/registry/issues/282 and https://github.com/gbif/registry/issues/294

rukayaj commented 2 years ago

Some links from a group who covered collection + institution representation in wikidata (from the BiCiKL hackathon):

https://docs.google.com/presentation/d/1UQxZsbjnv-F_fO9VOwE4pTqFkVuYvgY1FunvBDzngOY/edit?usp=sharing

DiSSCo &Cetaf institutions with their GBIF ID and ROR ID (which also has a wikidata link): https://docs.google.com/spreadsheets/d/15QLUKOCQWobvyFrduBZk75xkjowWWd0LfkEC7Zyjg0k/edit

All DiSSCo institutions are getting a ROR ID, this is in progress: https://docs.google.com/spreadsheets/d/1k6cTdaQ3Cujr2jh7-c3pns4K9Y9faPAdxniJrnZ7pXM

Mathias Dillen to Everyone (11:09): P4090 seems to be used exclusively for institutionCode/collectionCode/acronym.

rukayaj commented 2 years ago

This is a bit strange, I noted here that I made the change but it obviously must not have registered as UiT Fishery college is still in grscicoll... Unless I just made it in the gbif data publishers and not on grscicoll as well.

rukayaj commented 2 years ago

Now, somehow, we have three University of Oslo institutions: https://www.gbif.org/grscicoll/institution/2e961f1c-399e-4aab-92ea-0657722f8dcd and https://www.gbif.org/grscicoll/institution/390f06b3-a81e-41b9-972e-e790e0edfe04 and also NHMO https://www.gbif.org/grscicoll/institution/ffe44856-4ea4-4443-ab9f-4b3c7581b2bb

Screenshot 2021-12-08 at 09 42 01

I think it's coming from the data in the datasets themselves, so I will go through and correct them, and then merge these institutions again.

rukayaj commented 2 years ago

We are back to 1 institute for UiO:

Screenshot 2021-12-08 at 12 48 03
dagendresen commented 2 years ago

I think that the ambition of GRSciColl to synchronize with eg. Index Herbarium is causing problematic issues?

If you look at the UiO GRSciColl entry - this is indicated with the "Master record" at IH https://registry.gbif.org/institution/390f06b3-a81e-41b9-972e-e790e0edfe04 --> http://sweetgum.nybg.org/science/ih/herbarium-details/?irn=124083

The IH code "O" should maybe be maintained as at least a historic code and preferably remain "reserved" and redirect to UiO...! And maybe even remain a valid dwc:institutionCode value???

https://registry.gbif.org/institution/search?code=O --> no hits

I suspect that IH might override metadata added for UiO...??? Which might be a worrisome "bug"/"feature" with GRSciColl...???

There is a worrisome mixture of understanding of WHAT is the nature of the thing classified as "Institution" in GRSciColl. The "O" in IH represents (only) the botanical collections at UiO (in other words only the UiO herbarium) -- and could thus be argued to not represent the actual real-life institution maintaining these collections.

dagendresen commented 2 years ago

Maybe simply use "O" as the code for UiO? The code is only a string anyway.

rukayaj commented 1 year ago

We are using RoR IDs for institution ID, so perhaps we can close this now? For private companies we can use wikidata IDs as institution IDs as a fallback.