gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Common data publishing requirements between OBIS and GBIF #37

Closed dagendresen closed 2 years ago

dagendresen commented 3 years ago
dagendresen commented 3 years ago

Could we find a suitable "keyword-tag" for Norwegian Marine datasets and add tags through the Registry... as a possible mechanism to identify "OBIS datasets" mobilized through a national node. Might also maybe look at "keyword-tags" for fulfilling OBIS data requirements? (Such as WorMS LSID PID --> dwc:scientificName).

rukayaj commented 3 years ago

Could do 'marine-no' or something?

rukayaj commented 3 years ago

Tasks, try to do this before meeting on Thursday 1 July:

Nice to have as well; List of Norwegian datasets in OBIS that we don't have

dagendresen commented 3 years ago

Norway in OBIS: https://obis.org/country/161

Datasets in EurOBIS (all datasets?): http://www.eurobis.org/dataset_list http://www.eurobis.org/data_access_services

Maybe there are also unknown unknowns --- maybe some datasets on the IMR IPT (etc) perhaps are neither in GBIF nor OBIS...??? --- which of course would be even harder to find...

OBIS datasets in GBIF: https://reports.obis.org/gbif/

dagendresen commented 3 years ago

OBIS, GBIF Norway, and GBIF Colombia to develop a pre-recorded talk for the Global Nodes Meeting - the deadline for pre-recorded talk is 21 June - optimal length is approx 7 minutes.

Topics for pre-recorded talk:

Live GNM session on 1 July 2021 on OBIS and GBIF nodes collaboration.

Possibly plan a training curriculum on publishing marine data (datasets).

rukayaj commented 3 years ago

Develop a small service which 1) takes in dataset id, 2) looks it up in GBIF and gets list of scientific names for each record, 3) check to see if record has WoRMS LSIDs in scientificNameID (if possible), 4) counts all of these and returns a percentage of marine records, and hopefully also a percentage of those records with WoRMS LSIDs.

rukayaj commented 3 years ago

There is now a Jupyter notebook https://github.com/gbif-norway/marine-species-checker that can be cloned and used by anybody to check what percentage of their dataset is marine (i.e. WoRMS present) species. It also produces a download for them with the WoRMS LSID mapped onto their records, where possible.

rukayaj commented 3 years ago

There's an openrefine script from SiB colombia which does something similar: https://github.com/SIB-Colombia/data-quality-open-refine/blob/master/ValTaxonomicAPIWoRMS_ValTaxonomicaAPIWoRMS.txt

dagendresen commented 3 years ago

Nice :-) Would it be useful to make a live demonstration on using the tool on one or some Norwegian datasets? Or a slideset to show how and results. Overall goal to explore how compatible our datasets published in GBIF are with OBIS requirements :-)

An ideal tool would list all our (marine) datasets and display some indicator(s) on compliance with (different?) OBIS data requirements.

rukayaj commented 3 years ago

Yes, so if possible I'm going to try show a summary of all of our datasets with the marine ones tags, including percentage of marine species (where there's a match in WoRMS). And then i'll show the colab notebook and explain how to do it.

dagendresen commented 3 years ago

Would you be able to take the lead on making such a slide set (recording) by Friday so we could send it to Ward (OBIS) -- I am a bit swamped with the BioDATA course but can contribute in the evenings (or in the coffee breaks). The target would be to present an example of a GBIF node with the capacity (and interest to commit) to work with marine datasets to meet the OBIS data requirements - and address OBIS data quality issues when flagged by OBIS.

dagendresen commented 3 years ago

Maybe this OBIS tool is useful: http://iobis.github.io/gbif-marine/

rukayaj commented 3 years ago

Yes of course, I think we did agree that I should tackle this in the last meeting. Don't worry about it at all, I am working on it now and I have been working on it today. I was planning to make the summary of all the datasets and then do the recording which is why there's nothing yet in the slides. Maybe we don't need all the datasets though because it's taking quite a long time to parse the big ones, so I'll exclude some and get just a subset.

dagendresen commented 3 years ago

Would be cool to add WoRMS Aphia LSIDs into the DNV dataset

Apropos - the same dataset in OBIS (64,005 occ) and in GBIF (2,036,474 occ)

rukayaj commented 3 years ago

Some species seem to be missing AphiaIDs in MOD, e.g. Leitoscoloplos acutus. I'm checking with Thomas and adding them in as necessary.

Full list: 1 Leitoscoloplos acutus. 209 Lumbrineris aniara complex 210 Lumbrineris scopa complex 275 Tunicata Lamarck, 1816 2361 Cirripedia Burmeister, 1834 25757 Pectinoidea Rafinesque, 1815 28887 Hydroidolina Collins, 2000 35458 Astarte sulcata auctt. 45317 Hexacorallia Haeckel, 1896 46930 Echiuridae Quatrefages, 1847 53166 Heterobranchia 61465 Amphitrite Müller, 1771 126651 Hirudinea Savigny, 1822 130284 Gymnosomata Blainville, 1824 233962 Copepoda Milne Edwards, 1840 267652 Echinidea Kroh & Smith, 2010 541867 Tmetonyx barentsi 544081 Hyperiidea H. Milne Edwards, 1830 705340 Eunereis elitoralis (Eliason, 1962) 774427 Crustacea Brünnich, 1772 784716 Thoracica Darwin, 1854 799881 Aricidea laubieri 799921 Aricidea roberti 799962 Aricidea catherinae 800376 Aspidosiphon muelleri 800790 Aricidea wassi 801286 Myriotrochus vitreus 801381 Pherusa flabellata 802838 Nicomache quadrispinata 804408 Phascolion strombus 805109 Onchnesoma steenstrupii 805745 Aricidea simonae 805843 Golfingia margaritacea 810048 Pseudocuma simile 810763 Aricidea suecica 811696 Aricidea cerrutii 814135 Scolelepis squamata 817177 Atylus vedlomensis 823527 Ophiura carnea 827307 Pherusa falcata 827428 Aricidea quadrilobata 837470 Caryophyllia smithii 854371 Varia 1284714 Lepadomorpha Pilsbry, 1916 1290745 Pectinaria auricoma 1291877 Eunereis elittoralis juv. 1363352 Tellinoidea Blainville, 1814 1380940 Terebellomorpha Hatschek, 1893 1566391 Graptolithoidea Beklemishev, 1951 1640743 Veneroidea Rafinesque, 1815 1947856 Cymothoidae

rukayaj commented 3 years ago

AphiaID seems to be saved in Artsnavnebase for marine species under 'referanser':

Screenshot 2021-08-03 at 07 13 43

I am chatting to Stein about the possibility of including this in the marine datasets that we have in MUSIT.

rukayaj commented 3 years ago

Andreas Altenburger has sent a request to MUSIT for this data to be included, they said they needed it to come from a MUSIT user.

rukayaj commented 2 years ago

From the emails I see Stein said it would need to be a MUSIT-wide change, and we weren't able to get all of the museums to agree to adding aphiaID. The plan is now to include it in the new collection management system.