gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

GrSciColl mapping: Palaeontological collection, NHM, Oslo (O) UiO #4291

Open MortenHofft opened 1 year ago

MortenHofft commented 1 year ago

Dataset:Palaeontological collection, NHM, Oslo (O) UiO Published by: University of Oslo (country: NO) Records without a match: 202.056

Example record:

{
  "key": 3778304908,
  "datasetKey": "b2522b78-18ec-4ba6-ba16-9c9e215ce9e6",
  "datasetTitle": "Palaeontological collection, NHM, Oslo (O) UiO",
  "catalogNumber": "41493",
  "collectionCode": null,
  "collectionID": null,
  "institutionID": null,
  "datasetID": [],
  "institutionCode": null,
  "occurrenceID": "NHMO:PMO:41493"
}

Example lookup

GrSciColl

The institution look like it should be institution : O University of Oslo That also fits with the occurrenceID prefix NHMO which is the alternative code.

I do not see any candidates for the collection which I assume have code PMO and there are no Palaeontological collections that I can find in grscicoll for norway oslo

ManonGros commented 1 year ago

@rukayaj would it be possible to add some institution and collection codes and identifiers to this dataset? https://www.gbif.org/dataset/b2522b78-18ec-4ba6-ba16-9c9e215ce9e6, this way it could be linked to its corresponding GRSciColl entry(ies).

MortenHofft commented 1 year ago

Looking at UiO records for specimens more broadly I see this flags

Institution match none
1,691

Collection match none
194,404

Institution match fuzzy
418,960

Collection match fuzzy
595,185

Institution collection mismatch
1,467,610
rukayaj commented 1 year ago

Thanks for picking this up! I'll have a look when I'm back from Tajikistan, the week after next.

dagendresen commented 1 year ago

The "O" is the code for only the herbarium at the University of Oslo -- and should be **deleted** as the institution code for UiO in GRSciColl!! The "code" for the institution University of Oslo would rather more appropriate be "UiO". Screenshot 2022-09-27 at 11 44 45

For historical reasons, I suggest maybe including as alternative codes (?) the code O for the herbarium/Botanical Museum in Oslo, the code ZMO for the Zoological Museum in Oslo, and the code PMO for the Paleontological Museum in Oslo <-- since long ago rather a part of the Geological Museum in Oslo.

ManonGros commented 1 year ago

Hi @dagendresen you can delete the institution code (or put is as alternative code) but in order to do so, you must first disconnect the entry from Index Herbariorum. Go to the "Master source" tab of the institution page in the registry and delete the master source:

Screenshot 2022-09-27 at 12 36 41
rukayaj commented 1 year ago

So perhaps we should delete https://www.gbif.org/grscicoll/institution/390f06b3-a81e-41b9-972e-e790e0edfe04 and keep https://www.gbif.org/grscicoll/institution/a9ab29d6-f380-4193-967f-15a64abc09e7 . Thoughts @dagendresen ?

ManonGros commented 1 year ago

@rukayaj @dagendresen or merge them?

rukayaj commented 1 year ago

@ManonGros ah yes i'd forgotten about that. I've merged them now 👍

I removed the master source first because I suppose that's going to just re-create it otherwise. It was this, for the records: Index Herbariorum IH_IRN : 124083

rukayaj commented 1 year ago

In addition to the institutionID that Dag added, we now have collectionCode, collectionID and institutionCode as well - see e.g. https://www.gbif.org/occurrence/3778286403

ManonGros commented 1 year ago

Thanks @rukayaj, when the GRSciColl cache is refreshed (in a few days), we will reinterpret the data and this should tidy the linking.

ManonGros commented 1 year ago

@rukayaj @dagendresen it looks like the collectionID, institutionCode and collectionCode were only added for the first 100,000 occurrences (that's how it looks like in the archive). It looks like it might not be intentional. Could you take a look? Thanks!

In addition to that, it would be great if you could use 5c058644-6da2-47ea-ba73-7544ad8b05aa as collectionID instead of https://www.gbif.org/grscicoll/collection/5c058644-6da2-47ea-ba73-7544ad8b05aa.

rukayaj commented 1 year ago

Oh!!! I know why, I'll fix it now.

rukayaj commented 1 year ago

Done