gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Institutions inside the Artsobservasjoner dataset #6

Open dagendresen opened 3 years ago

dagendresen commented 3 years ago

The Artsobservasjoner dataset (and "NBIC other ds") published by NBIC on GBIF includes data from many different institutions that are not credited as data publishers in GBIF.

https://artsdatabanken.no/artskart/bidragsytere?Key=1435226523

Split by institutionCode

  Institution code Count  
  nof 20,102,346  Norsk Ornitologisk Forening
  nbf 2,648,781  Norsk Botanisk Forening
  miljolare 1,161,319  Miljølære.no (Skolelaboratoriet i realfag ved UiB)
  nef 891,511  Norsk Entomologisk Forening
  miljodir 654,097  Miljødirektoratet
  nsnf 441,278  Norges sopp- og nyttevekstforbund
  nzf 216,155  Norsk zoologisk forening
  zmbn 2,783  (UiB Zoologisk museum??)
  gu 1,900  (Norges geologiske undersøkelse???)
  nibio 1,829  NIBIO
  dn 1,714  (Miljødirektoratet ?)
  o 1,505  (UiO NHM O ?)
  bg 644  (UiB Bergen museum ?)
  imr 507  Institute of Marine Research
  ntnu-vm 114  NTNU University Museum
  trom 81  UiT University Museum
  smnh 46  (Sveriges meteorologiska och hydrologiska Institut ???)
  nvi 32  (Norsk vitskapsindeks ???)
  trh 10  (NTNU University Museum ???)
dagendresen commented 3 years ago

Artskart has a list of all institutions

Institusjon Antall observasjoner Antall unike arter
Norsk Ornitologisk Forening 20 150 581 885
Norsk institutt for naturforskning 4 157 926 8 137
Naturhistorisk Museum - UiO 3 076 454 26 569
Norsk botanisk forening 2 651 902 6 583
NTNU-Vitenskapsmuseet 1 414 820 20 883
Miljølære.no 1 282 356 3 372
Norsk entomologisk forening 897 997 12 974
Miljødirektoratet 883 266 5 864
MUST 806 008 596
GBIF-noder utenfor Norge 575 791 15 392
Havforskningsinstituttet 451 528 2 948
Norges sopp- og nyttevekstforbund 441 966 5 030
Universitetsmuseet i Bergen, UiB 423 107 12 765
BioFokus 399 760 14 804
Tromsø museum - Universitetsmuseet 299 222 12 704
Norsk institutt for vannforskning 291 348 2 860
NOF-NINA-DN 238 935 261
Agder naturmuseum 222 579 2 764
Norsk zoologisk forening 217 282 407
JBJordal 177 768 2 500
MFU 122 930 2 377
Norges miljø- og biovitenskapelige universitet 113 383 2 773
Fjellstyrene 48 475 5
Molltax 22 803 428
Svalbardflora.net 21 242 214
Rådgivende Biologer AS 14 241 1 220
Ecofact 12 481 1 383
Helgeland Museum - Rana 11 447 906
Arne Fjellberg 10 013 303
Norsk Polarinstitutt 9 671 150
ARC 8 913 93
Arkeologisk Museum, UiS 7 979 1 204
Ento Consulting 4 800 186
Asplan Viak 3 907 599
Bioforsk 3 740 95
Nord Universitet 3 313 19
Norsk Natur Informasjon 3 112 565
Norsk institutt for bioøkonomi 2 736 701
Faun Naturforvaltning AS 2 324 659
Artsdatabanken 1 594 83
Sweco Norge AS 1 112 246
Naturrestaurering AS 777 249
Norges fiskerihøgskole, UiT 680 101
NINA FMNT 465 1
Multiconsult 308 29
Veterinærinstituttet 31 14
dagendresen commented 3 years ago

And some other possible "institutions"

Consultancy companies (private sector)

institution approximate occurrences notes
Akvaplan-niva (ca 115 995 occurrences)
Miljøfaglig Utredning (ca 36 703 occurrences)
Natur of Samfunn AS (ca 7 mill occurrences) including recorded by etc
COWI AS (ca 289 516 occurrences)
... ... ...
... ... ...

Bird Stations

The NOF (Norwegian Ornithological Society) institution code includes numerous separate bird stations. In particular, Nordre Øyeren bird station has asked for better visibility under their own institution ID.

institution approximate occurrences notes
Nordre Øyeren Fuglestasjon (ca 525 075 occurrences) Note! registered in GBIF
Kragerø Ringmerkingsgruppe (ca 853 557 occurrences)
Utsira Fuglestasjon ... ...
Jomfruland Fuglestasjon ... ...
... ... ...
... ... ...
rukayaj commented 3 years ago

Status of this task: We requested that NBIC split up the dataset (so that most of the bigger institutions at least are separate) a few years ago. Since then we've had several discussions and emails about it. The latest one with Knut Anders Hovstad on 2021-02-16:

This is an issue that we are working on and I hope we have solutions quite soon that can target the needs of consultants, organizations etc. that delivers data through Artsobservasjoner. I'll return to you with more information on this.

I volunteered to help them:

You could just make multiple datasets, and for each of them just modify the select statement from the View_Artsobservasjoner2​ to have a WHERE InstitutionCode = "nof", for example. You could even make a spreadsheet of the different institutions and write a script to automate the dataset creation process on the IPT. I haven't done anything like that but the IPT doesn't have a database so I am pretty sure it would just be a case of changing flat files. ​I'm happy to help figure out how it can be done, if that would be useful for you.

But we have not had a response to this.

Next action: I suggest we ask for a status update again in a month or so.

rukayaj commented 3 years ago

Just to keep this issue up to date:

\We received an email - GBIF: Notification of upcoming removal of your organization "Nordre Øyeren Bird Observatory" from the GBIF list of data publishers

Dag sent another email query: Hei Stein og Knut,

Hvordan er status og muligheten for å knytte datasett for fuglestasjoner slik som Nordre Øyeren direkte til egen registrering i GBIF?

(Ihvertfall bryte opp til egne datasett, dersom å splitte opp "publisher" er sensitivt).

rukayaj commented 2 years ago

Linked to #81

dagendresen commented 2 years ago

If we add Nordre Øyeren Bird Observatory (NØF, noef) and Biorehab Klepsland, etc to GRSciColl it should be possible to have the institution code from NBIC ArtsObs linked to the institution code in GRSciColl...

Sorry, just noticed that NBIC ArtsObs code nof must be the entire Norw Ornithological Society :-) --> so updated the institution code for Nordre Øyeren Bird Observatory to noef.

BUT could have worked if NBIC ArtsObs might have used the code noef for Nordre Øyeren Bird Observatory.

rukayaj commented 2 years ago

Do you think it's suitable to add non scientific collections institutions to GRSciColl?

dagendresen commented 2 years ago

I actually think that it makes little sense to include any institutions in the GRSciColl (because I do not believe in DwC triplets)

The original rationale for GRSciColl/GRBio was/is to maintain a list of institution codes and collections codes for the Darwin Core triplet (doi:10.3897/BDJ.4.e10293) - as is described in Darwin Core as a recommended option for building the occurrenceID

And the occurrenceID is the identifier for an Occurrence (notably NOT for identifying a specimen). From this perspective, it would thus make equal sense (or non-sense) to include institutions that hold/publish Occurrence data as for institutions that hold collections. Because the institution code managed by GRSciColl is used to identify Occurrence records, not collection specimens ;-)

dagendresen commented 2 years ago

If background is needed, see Guralnick et al. 2014 The trouble with triplets in biodiversity informatics: a data-driven case against current identifier practices, doi:10.1371/journal.pone.0114069

dagendresen commented 2 years ago

Nordre Øyeren Bird Observatory (Registry) is asking for support with finding their own records on GBIF.org.

They try to remake the following API search on GBIF.org https://api.gbif.org/v1/occurrence/search?datasetname=N%C3%98F-vannfugltellinger

They now have ROR ID https://ror.org/04t667177

Could maybe this ROR ID be published as institutionID through Artsobservasjoner...??

We could also remind NBIC on splitting the large Artsobservasjoner into more appropriate datasets?

rukayaj commented 2 years ago

We could also remind NBIC on splitting the large Artsobservasjoner into more appropriate datasets?

I'll ask Knut about the ROR ID/institiontID thing in the meeting. I'm not feeling very well today, caught a cold (not covid), but I'll also have a look at the datasetName/datasetID search thing later.

rukayaj commented 2 years ago

I asked about ROR/InstitutionID, K will ask Stein about it but is not very optimistic as it's a political issue, but he will try.

dagendresen commented 2 months ago

Thomas Sæther wrote 2024-04-10:

Jeg ble akkurat oppmerksom på at GBIF nå endelig har åpnet opp for strukturert søk på Dataset ID og Dataset Name. For eksempel: https://www.gbif.org/occurrence/download?advanced=1&dataset_id=218 Det betyr kanskje at vi kan få laget noen DOI'er til noen av prosjektene våre og på den måten få Publisher-siden vår på GBIF opp å stå igjen? Eventuelt legge til datasett på GRSciColl-siden vår: https://scientific-collections.gbif.org/institution/cf68b61e-a6e6-4483-a97b-1ad7eb21188b Hilsen Thomas, Nordre Øyeren Fuglestasjon

dagendresen commented 2 months ago

Unfortunately functionality in GBIF to filter for a datasetName or a datasetID is unfortunately not enabling us to make it a dataset with a DOI (from GBIF). When the datapoints are published through Artsobservasjoner we will still need for Artsdatabanken to actually publish the data records as a separate dataset! We can try again to convince Artsdatabanken, but they seem to have concluded on publishing Artsobservasjoner data credited to the biological societies (birds as NOF).

An alternative is for you to publish the data records outside of Artsobservasjoner directly to GBIF by yourself. But this would mean that we do not have any database such as Artsobservasjoner for you to manage your data records. And as I understand not what you want.

I have attached a figure 1 to show that the datasetID 218 is not sufficiently unique to identify your records from NØF. Notice that other institutions use the very same datasetID - nsw, kku, mma, uconn, ... The datasetID 218 is maybe generated (automatically) by Artsobservasjoner?

Your link to your dataset in the GRSciColl Scientific Collections data portal will unfortunately not work for your dataset which include species observations. This particular data portal (a so-called GBIF Hosted Portal) is setup to only display collections specimens. However, we could of course help you to setup a NØF hosted portal - provided we can find an appropriate filter (maybe eg. datasetID = 216 + institutionCode = nof, or similar).

dagendresen commented 2 months ago

Email to Artsdatabanken 2024-04-10:

Nordre Øyeren Fuglestasjon søker fortsatt muligheter for å krediteres i datasiteringer til egne dataposter publisert via Artsobservasjoner. Jeg tror den eneste muligheten per idag er dersom det er mulig å legge ut deres datasett som et eget separat datasett fra Artsdatabankens IPT, men forstår at dette fortsatt er litt politisk krevende? Jeg har forsøkt å overtale GBIF sekretariatet til å legge til funksjonalitet for å telle datasiteringer til dwc:institutionID = deres ROR ID på hver enkelt datapost. Jeg fikk også god støtte til dette ønsket i GBIF mid-term komiteemøte i forrige måned. Men det er selvsagt fortsatt noe som uansett bare ville komme inn på GBIF sekretariatets langsiktige arbeidsplan - som er eksperimentell pilot. (Men de innser at dette er et ønske fra flere). Det jeg umiddelbart funderer på er om det er mulig å legge inn dwc:institutionID = https://ror.org/04t667177 i publisering av Artsobservasjoner til GBIF?