gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

GBIF Occurrences from the Arctic #103

Open dagendresen opened 2 years ago

dagendresen commented 2 years ago

UiO-NHM wants to know how much material in the collections in Oslo are from the Arctic.

The CAFF boundary source file for the Arctic is available at: http://geo.abds.is/geonetwork/srv/eng/catalog.search#/metadata/2ad7a7cb-2ad7-4517-a26e-7878ef134239

Shapefile

See the GBIF email archive from 2016 at https://lists.gbif.org/pipermail/api-users/2016-February/000289.html

{
  "creator":"your-gbif-user-name",
  "notification_address": ["your-email-adress"],
  "predicate":
  {
    "type":"within",
    "geometry":"POLYGON((179.285633734761433 50.81429697072722,179.600130309538514 50.798132663162107,180.0 50.825202656741141,-179.985731034706419 50.826168623278733,-179.642658093259286 50.834515997684633,-179.213664117412264 50.826278190103068,179.014229185979531 50.811745655072343,179.285633734761433 50.81429697072722))"
  }
}

Caff_Boundary

https://www.gbif.org/occurrence/search?has_geospatial_issue=false&advanced=1&geometry=POLYGON((179.285633734761433%2050.81429697072722,179.600130309538514%2050.798132663162107,180.0%2050.825202656741141,-179.985731034706419%2050.826168623278733,-179.642658093259286%2050.834515997684633,-179.213664117412264%2050.826278190103068,179.014229185979531%2050.811745655072343,179.285633734761433%2050.81429697072722))

A simple box above 70 degree N works&occurrence_status=present)

rukayaj commented 2 years ago

@MichalTorma @dagendresen I just thought - if the question is "how many specimens does UiO have from the arctic"the easiest thing for this would probably be to dl all the collections data, load them into qgis and do a query to find out number of points in the CAFF arctic polygon. It would be nice to make it work with the gbif API and occurrence search too of course, but for a quick answer I get 97 937 points when I do what I just described.

dagendresen commented 2 years ago

Sounds good :-) Maybe a GBIF API query might be useful for tracking the number of species from the Arctic over time - repeated queries... however, not sure if this is interesting here...

rukayaj commented 2 years ago

Ok then I replied to BP and I will close this.

dagendresen commented 2 years ago

I think it is still an interesting topic (and recurring helpdesk question) to explore how to search the GBIF API with a polygon query attribute for the Arctic! And suggest keeping this open - or to start a new helpdesk issue?

And I do suspect that Bjørn Petter is more interested in WHAT species and specimens we have from the Arctic -- than only in only the naked number :-D

MichalTorma commented 2 years ago

I condensed the issue here: #gbif/gbif-api/issues/44

rukayaj commented 2 years ago

Somewhat related, polar projections are now available on the map: https://twitter.com/timrobertson100/status/1539585887520137217

rukayaj commented 2 years ago

We've had an email from one of our staff asking about this again - not number of specimens but a way to have a DOI citing all of the specimens in our collections falling in the arctic circle. He suggests republishing them as a new dataset - but if already published on GBIF this is not ideal!

From my email: The ideal way to do this would be to have a GBIF API query which you can use to get a citeable DOI - then you'll get the citation tracking and you don't have to bother with a whole new dataset. As you'll see in the github issue, GBIF's search can't actually handle polygons circling the pole, and I don't think we've got any further with this (@Michal?). I wonder whether we can't work out a way to break up the search and have two or three DOIs that you guys can cite, it's not very neat but maybe that would work?

Other option, maybe: tag all of these records across all collections (or just the bird collection, if you're only interested in that collection) using the datasetName field

Thoughts?

dagendresen commented 2 years ago

GBIF has a tool for creating "derived datasets" if eg. wanting to combine data records from different search filters into one DOI to be cited. In practice, you need a list of the GBIF occurrenceKey record identifiers. Also, in practice, citing multiple GBIF download dataset DOIs in your data paper would not cause any problem for the GBIF data citation metrics - but might clutter the reference list of your data paper with more GBIF data citation records than you might prefer.

However, you of course cannot eg. update or change data points when citing them to the GBIF portal. It would thus be useful ALSO to cite the cleaned dataset that you are actually using. A best practice could be to publish the final data points you use for the data paper in Zenodo (or similar). If the data publication in Zenodo is also formatted as a Darwin Core Archive might be useful for others wanting to reuse it! The Zenodo dataset should also cite the respective GBIF download (or "derived dataset") DOI!

rukayaj commented 2 years ago

Ah, good idea to publish it on Zenodo! I think they aren't cleaning the dataset, it's just the selected records - but we can check... If they are cleaning it would be great to get those corrections into MUSIT/Corema as well.

dagendresen commented 2 years ago

I think they aren't cleaning the dataset

Zenodo is of course not cleaning or touching the data in any way! GBIF is of course not cleaning or modifying the records in any way based on external sources such as Zenodo.

However, somebody wanting to reuse the new derived dataset can download from Zenodo, and if they are good, merge those improved records with records since further improved from the source and made available in GBIF. I know it is a tedious data fusion process!!! Real data annotation services together with REAL and trustworthy persistent identifiers could make this a dance on roses!!!

MichalTorma commented 2 years ago

After further investigation, this seems to be connected to the issue with underlying spatial4j library issue https://github.com/locationtech/spatial4j/issues/5 . The issue is quite old but I inquired about it anyway.