Closed rukayaj closed 2 years ago
Maybe simply override the collectionCode in the IPT? And at the same time add (hard-coded) collectionID? -- assuming there is one dataset per collection.
Yes that's what I'm doing, I've added collectionCode (and also collectionID) for a few of them so far, I'll do the others tomorrow :)
I've finished adding the collections missing from grscicoll (the herbarium ones mostly), and adding the collectionIDs + proper collectionCodes to all the datasets.
Some outstanding issues:
Note: the occurrenceIDs are dwc triplets, but have remained the same (using the old collectionCodes)
duplicate catalogNumbers causing problems because ... the catalogNumbers (dwc-triplets) are used as occurrenceID? Any thoughts/plans on assigning materialSampleIDs to the specimens?
... apropos, maybe simply using dwc-triplets as the occurrenceID is a good idea (because we do not really care too much about the occurrenceID (no longer :-) -- and assigning UUIDs as materialSampleID (urn:uuid:UUID or purl:UUID ... or maybe even in time handle:UUID :-)
Yes, causing problems because catalognumbers are used in occurrecenIDs :( But I think it's a problem in MUSIT as well for them to have duplicate catalognumbers.
I guess this is related to https://github.com/gbif-norway/helpdesk/issues/29. We can certainly decide to assign materialSampleIDs to our collection items. MUSIT isn't generating them so we'd need to generate them separately and keep them linked to the occurrenceIDs, generating new ones for new records.
This might also be something to think about with the new collection management systems that are getting trialed. In fact, to me it makes sense for the collection management systems to at least be aware of the materialSampleIDs on some level...
Would absolutely be best if the materialSampleIDs are registered (and created) in the CMS!
There was an error in the MUSIT export causing subnumbers (e.g. "/1") to be left off from the catalognumbers (so R-878 should have been R-878/1, etc). This error is in some other datasets as well at NTNU, so there is some discussion now about the best way to fix them. I think if we end up having to change the occurrenceIDs it would be great to get UUIDs in there if that's possible with MUSIT.
Karstein at NTNU is back from being on holiday this week, so hopefully that means he can give feedback on what the change would mean for his collections and something will happen on this issue.
We are now trying to get UUIDs added for all the UiT datasets! (Seeing as the 'triplet' style occurrenceIDs would have to change anyway to add the sub numbers) EDIT: Makes more sense to keep the UUIDs for the materialSampleIDs, and carry on using triplets for occurrenceIDs
Update: this is still in progress, waiting for MUSIT to finish changing their data publication views.
Just sent a follow up email.
All fine now.
MUSIT for some reason is publishing the UiT collections in under one or two generalised collection codes (EVERT and TROM) instead of the collection codes that they actually use. We also need to sync these collection codes with grscicoll, there are also some unpublished collections which should still get listed.
Zoological collections list: https://uit.no/research/natscicol/project?pid=717007
Herbarium collections list: https://uit.no/research/natscicol/project?pid=717010
Geology collections list: https://uit.no/research/natscicol/project?pid=717009 (not yet in grscicoll - should we add the fossils one at least? does grscicoll aim to cover all scientific collections or just biological?)
Quick links to the API facet for collection codes for each uit dataset so we can check them easily: