gbif / hp-uk-collections

Source for the DiSSCoUK Data Portal provided by GBIF Hosted Portals
https://dissco-uk.org/
2 stars 1 forks source link

GrSCiColl mapping for Natural History Museum, London #1

Open MortenHofft opened 1 year ago

MortenHofft commented 1 year ago

Hi @jrdh You mentioned you would like to know if there was any issues with matching NMH-London to GrSciColl.

https://uk-collections.hp.gbif-staging.org/institution/1d808a7c-1f9e-4379-9616-edb749ecf10e/collections There are currently no collections for the museum in GrSciColl, so nothing is matched to a collection (except a few, which is a bug that has been fixed)

What does GrSciColl model? It is also worth mentioning that GrSciColl is a simple model with only 2 entities: Institutions and collections. Collections can have an institution. So modelling things like Institution => museum => department => collection => sub-collection => expedition isn't an option. In those cases it is necessary to simplify it to a level where you think it makes sense to group data. GrSciColl do not really model the history of a collection/institution either (e.g. this collection started with X, then collection Y was merged with it, and then later split into A, B, C). There is support for labelling inst+coll as inactive. They can be deleted. And the can be marked as replaced by something else. Everything else has to live as comments or in other systems (Wikidata?).

Missing collections These collection codes are the top 20 most frequently used in the data. Ideally everything could be matched. Sometimes the solution is to create a collection. Other times to add an alternative code to a collection. And other times still to make changes to the published data. I have not looked into the specifics for this case.

"collectionCode": [
  {
    "key": "bmnh(e)",
    "count": 1917322
  },
  {
    "key": "zoo",
    "count": 1366842
  },
  {
    "key": "bot",
    "count": 1052581
  },
  {
    "key": "pal",
    "count": 573230
  },
  {
    "key": "nhm",
    "count": 27316
  },
  {
    "key": "en",
    "count": 2008
  },
  {
    "key": "the natural history museum, london",
    "count": 419
  },
  {
    "key": "bm",
    "count": 377
  },
  {
    "key": "insects",
    "count": 210
  },
  {
    "key": "british antarctic survey expedition jr17003a",
    "count": 180
  },
  {
    "key": "meyrick coll. bm 1938-290",
    "count": 142
  },
  {
    "key": "type",
    "count": 94
  },
  {
    "key": "london, the natural history museum - zoologische sammlung",
    "count": 61
  },
  {
    "key": "ibc",
    "count": 51
  },
  {
    "key": "vertebrate palaeontology",
    "count": 40
  },
  {
    "key": "10.3897/zookeys.32.282",
    "count": 37
  },
  {
    "key": "z bry",
    "count": 15
  },
  {
    "key": "gt1879jerónimo",
    "count": 11
  },
  {
    "key": "walsingham coll. 1910-427",
    "count": 11
  },
  {
    "key": "palaeontology",
    "count": 8
  }
]
jrdh commented 1 year ago

Thanks @MortenHofft this is super useful. I think I know the best course of action but will double check with people here to make sure they agreed! Then, hopefully, I can resolve this soon!