gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Publish the UiO Paleontology collection on our IPT #93

Closed dagendresen closed 1 year ago

dagendresen commented 2 years ago

Eirik has a fresh copy of the NORPAL collection database (in dBASE, FoxPro format)

A strategy could be to (1) publish the PAL base (as is) in Zenodo (following our described routine), and (2) extract what we succeeded to map to Darwin Core and publish on the IPT.

Can we meet on Friday at 10 (before the all-staff meeting at 11:15; and Peters disputation at 13:15)? To start preparing the dataset for publication?

rukayaj commented 2 years ago

Just to keep this issue up to date, @MichalTorma volunteered to take a look at this (I think you did right!?).

Did we decide anything about adding materialSampleID?

Please add in here if there's anything I've forgotten.

dagendresen commented 2 years ago

These are all MaterialSamples!! -- so I suggest to not bothering with creating any occurrenceID UUIDs - but maybe use DwC-triplets here? Creating FALSE occurrenceID identifiers will only cause us severe headaches later... ;-)

But yes to create and add urn:uuid:UUIDs as materialSampleID

dagendresen commented 2 years ago

Another question might be if bothering to create the "parent MaterialSample" from each PMO_NR without the sub-no -- for the actual rock that (I assume) includes 1 to many fossils (as enumerated by the sub-no).

I suggest to NOT create these "parent MaterialSample"s - until possibly next round - and in coordination (!) with Hans Arne.

rukayaj commented 2 years ago

Some more notes:

Main spreadsheet:

Types spreadsheet:

rukayaj commented 2 years ago

For the Type field: The TYPE field is intended to specify real types, ie parts of the type series: H = holotype P = para (lecto) type L = lectotype N = neotype S = syntype

"T" should not be found here. At least I could not find any.

"R" must be an error. Thanks for making me aware of that. Have a suspicion of what may have happened. Must check this more closely and will probably delete these Rs.

rukayaj commented 2 years ago

Just to update this, we have something now that is working quite well for the date/name extraction, and the rest of the data cleaning is minor and should be fairly easy.

rukayaj commented 2 years ago

https://ipt.gbif.no/manage/resource.do?r=o_fossils

rukayaj commented 2 years ago

I would say that these two files (type specimens and ordinary specimens) should get published as a single dataset seeing as they will have pretty much the same metadata - any thoughts?

dagendresen commented 2 years ago

Sounds reasonable -- however, can you make sense of the non-type table? Should these records be published at all? Are the records persistently identifiable and persistently linked to the physical specimens/samples?

rukayaj commented 2 years ago

Sounds reasonable -- however, can you make sense of the non-type table? Should these records be published at all? Are the records persistently identifiable and persistently linked to the physical specimens/samples?

Yes, Eirik says the SUBNO and PMO_NR will remain consistent, so we will be able to link them up again. We dropped all of the ambiguous ones, so we're only publishing the ones that meet certain criteria.

It's all ready to be published now, just waiting for him to fill in the metadata.

rukayaj commented 2 years ago

Followed up with him on 11 May

rukayaj commented 2 years ago

Published :) https://www.gbif.org/dataset/b2522b78-18ec-4ba6-ba16-9c9e215ce9e6

rukayaj commented 1 year ago

See https://github.com/gbif/portal-feedback/issues/4291, I forgot to add a grscicoll collection code. And Morten found some more, so I'll do a bit more tidying up here when I'm back.

rukayaj commented 1 year ago

According to Eirik it should be PMO

dagendresen commented 1 year ago

Apropos - "O" is only for the herbarium, and should probably not be in the dataset title. PPS: All the locations in the south seem to be wrong --> Svalbard and Canada (not Africa and India)

dagendresen commented 1 year ago

Removed the "O" from the dataset title and added a hard-mapped institutionID mapping to the ROR identifier, https://ror.org/01xtthb56

rukayaj commented 1 year ago

All the locations in the south seem to be wrong --> Svalbard and Canada (not Africa and India)

Yes, we sent Hans Arne some feedback asking about this and some other issues, but didn't get a reply.

rukayaj commented 1 year ago

We are very happy to tell you the palaeontological collection has now been published to GBIF - https://doi.org/10.15468/wmmvk3 . This DOI can be used to track publications which use data from this dataset, and I notice there is actually already somebody who has downloaded some data points from us (https://www.gbif.org/dataset/b2522b78-18ec-4ba6-ba16-9c9e215ce9e6/activity).

We have published the type specimens (23809) and non-type specimens (178247) together. In total 202 056 occurrences.

There are a few issues we noticed:

There are a lot of latitude and longitude coordinates of '-9' , which I guess is wrong? Should we assume that these coordinates are invalid and remove them before we publish? There is a tabular list of the affected records here: https://www.gbif.org/occurrence/search?dataset_key=b2522b78-18ec-4ba6-ba16-9c9e215ce9e6&issue=COUNTRY_COORDINATE_MISMATCH There are some records where GBIF did not recognise the taxon - https://www.gbif.org/occurrence/search?offset=0&dataset_key=b2522b78-18ec-4ba6-ba16-9c9e215ce9e6&issue=TAXON_MATCH_NONE . Perhaps this is ok in most cases, but it might be helpful to use this information to make some corrections to the source database? Of the non-type specimens we couldn't publish 3901 records, and of the types we couldn't publish 4017 records. This is because it was not possible to make a unique combination of the PMO_NR and SUBNO columns. We could drop in and talk to you about this if you're available tomorrow afternoon?

Please have a look and see if there are any other issues we should correct (Eirik, can you take a look too please?). We can publish as many corrections as we choose, and we should publish regular updates - so should that be one every six months as specified in the metadata?

MichalTorma commented 1 year ago

Extinct taxa in an extant world https://doi.org/10.3897/biss.6.94417 - it might be relevant to pass this talk on when it becomes public

rukayaj commented 1 year ago

IMG_20221020_111418 From holly little's talk about wikidata

dagendresen commented 1 year ago

I have been imagining we could explore more if possible to record collecting sites in DEIMS...?

(DEIMS is focused on a catalog of LTER monitoring stations --- but the DEIMS developer (Christoph Wohner) suggested (during the BioDT WS in Potsdam last month) that DEIMS could also handle collecting sites (sites which are not yet revisited -- on the rationale that they could be..).