gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Fix UiT collection codes #62

Closed rukayaj closed 2 years ago

rukayaj commented 3 years ago

MUSIT for some reason is publishing the UiT collections in under one or two generalised collection codes (EVERT and TROM) instead of the collection codes that they actually use. We also need to sync these collection codes with grscicoll, there are also some unpublished collections which should still get listed.

Zoological collections list: https://uit.no/research/natscicol/project?pid=717007

Herbarium collections list: https://uit.no/research/natscicol/project?pid=717010

Geology collections list: https://uit.no/research/natscicol/project?pid=717009 (not yet in grscicoll - should we add the fossils one at least? does grscicoll aim to cover all scientific collections or just biological?)

Quick links to the API facet for collection codes for each uit dataset so we can check them easily:

On gbif Currently used collection codes Dataset name Source IPT link Correct collection code grscicoll code
gbif b581fcfa-9f31-431a-8431-1800ef8a554b Insect collection, UiT, University Museum (TSZ). Insect labeling project and PhD-duty-work non-musit No change - -
gbif 77e91a4f-7b9d-458d-b585-298aa1ef41f8 Echinodermata (TSZE) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-e TSZE 1396a824-6313-4473-adf9-4b78c2bec059
gbif 02f57aa9-12ae-4c8c-9d9f-f8b3245cf31a Crustacea collection (TSZCr) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-cr TSZCr dab0d0e4-5889-422b-a193-abac584b03f5
gbif 060eecc9-75b0-4952-bb48-e400d3ad7771 Chelicerata collection (TSZCh) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-ch TSZCh db9c8f41-0c88-465b-8f49-5e30583fe517
gbif a33a6578-bf83-4fa9-b39b-3a3738b72b68 Annelida collection (TSZA) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-a TSZA ac6014f4-ad81-43a1-b6cf-1a980e1048cb
gbif b4804f19-8a8a-49e7-8dc2-79b528635696 Mesozooplankton Ramfjord non-musit No change - -
gbif d391c193-0fc0-4a96-bdac-6043dd9516d1 Tunicates collection (TSZT) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-t TSZT 90289d07-50b1-4c98-a6e0-40ef505a8087
gbif 8dbe67d7-26df-4810-a989-d246f0e93edb Foraminifera and other protozoa (TSZF) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource?r=uit-tsz-f TSZF 2a5ada8e-e2e9-46e5-85df-83c7174e0931
gbif e92bb2ed-1efe-4d2a-a17d-52182c562db8 Bryozoa, Brachiopoda and Entoprocta (TSZBr) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-br TSZBr fee64bc6-6a2c-4433-8bf8-29f62d8191c8
gbif b5a4d944-8a51-4104-a4a7-c3637128bcc7 Various marine groups from the collection at (TSZY) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-marine-invertebrates TSZY 7905172a-772a-43b0-a3b8-a737484c052b
gbif 88fd0226-d867-4946-be8b-31f584f2201d NORSC - Sciaroidea, UiT Tromsø Museum non-musit No change - -
gbif f2a77c80-1e74-4c23-a3c9-c52cede89434 Entomology collection, UiT Tromsø Museum musit - -
gbif 8d1253fc-30b6-4b4d-a12d-d711391e5382 Dataset of fungus gnats (Diptera,Sciaroidea) from Møre og Romsdal county in Norway non-musit No change - -
gbif 0f061eff-6854-4bb3-abe2-acb184ea3ab7 Bryophyte herbarium, UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=trom_bryophytes TROM-B 560d0f23-12cd-4091-8dd7-49b7a992b75c
gbif 374e0d4c-cf9f-4e1a-97a4-14123ee1bb7e Mycology herbarium, UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=trom_fungi TROM-F dd6496eb-faf8-4ff3-ad80-20fc08cd5c89
gbif e87a12af-fc4c-4315-bff7-c7b827379aca Lichen herbarium, UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=trom_lichens TROM-L 91ddfc12-5a29-4ca3-9898-f120d1126673
gbif d0aa984e-c6d3-45ee-8fc0-df1df8f4126b Vascular plant herbarium, UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource?r=trom_vascular TROM-V ab06309e-b392-4f5e-99fc-4ae494dd5088
gbif ed527c71-23aa-40cf-b16b-1a5b8ec6770a Arthropod collection, UiT Tromsø Museum non-musit No change - -
gbif 4894f2f8-74b5-403e-bd8d-6fe5123a3f71 Algae, Norwegian College of Fishery Science No change - -
Cnidaria and Ctenophora (TSZR) UiT Tromsø Museum musit https://ipt.gbif.no/manage/resource.do?r=uit-tsz-r TSZR 28891ea8-e7dc-4163-a6da-d266b19c7e2c
dagendresen commented 3 years ago

Maybe simply override the collectionCode in the IPT? And at the same time add (hard-coded) collectionID? -- assuming there is one dataset per collection.

rukayaj commented 3 years ago

Yes that's what I'm doing, I've added collectionCode (and also collectionID) for a few of them so far, I'll do the others tomorrow :)

rukayaj commented 3 years ago

I've finished adding the collections missing from grscicoll (the herbarium ones mostly), and adding the collectionIDs + proper collectionCodes to all the datasets.

Some outstanding issues:

Note: the occurrenceIDs are dwc triplets, but have remained the same (using the old collectionCodes)

dagendresen commented 3 years ago

duplicate catalogNumbers causing problems because ... the catalogNumbers (dwc-triplets) are used as occurrenceID? Any thoughts/plans on assigning materialSampleIDs to the specimens?

... apropos, maybe simply using dwc-triplets as the occurrenceID is a good idea (because we do not really care too much about the occurrenceID (no longer :-) -- and assigning UUIDs as materialSampleID (urn:uuid:UUID or purl:UUID ... or maybe even in time handle:UUID :-)

rukayaj commented 3 years ago

Yes, causing problems because catalognumbers are used in occurrecenIDs :( But I think it's a problem in MUSIT as well for them to have duplicate catalognumbers.

I guess this is related to https://github.com/gbif-norway/helpdesk/issues/29. We can certainly decide to assign materialSampleIDs to our collection items. MUSIT isn't generating them so we'd need to generate them separately and keep them linked to the occurrenceIDs, generating new ones for new records.

This might also be something to think about with the new collection management systems that are getting trialed. In fact, to me it makes sense for the collection management systems to at least be aware of the materialSampleIDs on some level...

dagendresen commented 3 years ago

Would absolutely be best if the materialSampleIDs are registered (and created) in the CMS!

rukayaj commented 3 years ago

There was an error in the MUSIT export causing subnumbers (e.g. "/1") to be left off from the catalognumbers (so R-878 should have been R-878/1, etc). This error is in some other datasets as well at NTNU, so there is some discussion now about the best way to fix them. I think if we end up having to change the occurrenceIDs it would be great to get UUIDs in there if that's possible with MUSIT.

rukayaj commented 3 years ago

Karstein at NTNU is back from being on holiday this week, so hopefully that means he can give feedback on what the change would mean for his collections and something will happen on this issue.

rukayaj commented 3 years ago

We are now trying to get UUIDs added for all the UiT datasets! (Seeing as the 'triplet' style occurrenceIDs would have to change anyway to add the sub numbers) EDIT: Makes more sense to keep the UUIDs for the materialSampleIDs, and carry on using triplets for occurrenceIDs

rukayaj commented 3 years ago

Update: this is still in progress, waiting for MUSIT to finish changing their data publication views.

rukayaj commented 2 years ago

Just sent a follow up email.

rukayaj commented 2 years ago

All fine now.