gbif / portal16

GBIF.org website
https://www.gbif.org
Apache License 2.0
24 stars 15 forks source link

Show IUCN species ranges on species pages #1322

Open timrobertson100 opened 4 years ago

timrobertson100 commented 4 years ago

Explore the IUCN Geo APIs to see if we can put overlays on the maps from their servers or if we need to host a copy for this use.

(@andrewrodrigues is our liaison with IUCN for this)

timrobertson100 commented 4 years ago

If API does not support it, refer to https://github.com/gbif/geocode/issues/5

MortenHofft commented 4 years ago

It is worth mentioning that we have stopped using their APIs to show the threat status. Their API couldn't handle the load. Instead we now use wikidata to get the IUCN threat status. But let us see what happens if we add the maps, it might work better as the ranges won't be shown at all times.

timrobertson100 commented 4 years ago

But let us see what happens if we add the maps, it might work better as the ranges won't be shown at all times.

Perhaps we should consider running both through caching proxies. If they have a typical PNG slippy / tile-map service Thumbor could work well, although selectively expunging that is not trivial (CC @MattBlissett ). Varnish is another option perhaps with custom retention policies.

MortenHofft commented 4 years ago

I've asked for a token and access to the APIs (some of it seem to require a login). I also took the chance to ask about expected performance limitations.

It is not slippy tiles but WMS. In their examples you ask for a bounding box for the map extend. But they might have WMTS as well

Update: WMS only, so caching probably makes less sense. They also have WFS if we want to get the underlying data that way.

MattBlissett commented 4 years ago

Varnish is probably better -- e.g. for rs.tdwg.org I set the expunge time to something like a month, with a flush on redeploy.

If we need to make a WMTS, it's very easy (particularly for only one or two layers) directly from PostGIS to produce MVTs (two SQL queries, a few lines of code to convert z/x/y into a bounding box + buffered bounding box).

timrobertson100 commented 4 years ago

@andrewrodrigues - this discussion suggests we may need a copy of the ranges to proceed and will need to put up a map service. We could limit that to images though, so the geometries aren't exposed if that is their concern.

Getting a copy of the dataset will also enable https://github.com/gbif/geocode/issues/5 and https://github.com/gbif/pipelines/issues/258 to proceed, and allow GBIF to report to IUCN with evidence-based data showing where the ranges may be worth reviewing.

andrewrodrigues commented 4 years ago

Looking through previous red list committee minutes, they expressed some concerns about reposting maps on GBIF. They needed to have an internal discussion on how the maps can be reposted. I have contacted them to open up that internal discussion again and in the meantime, if we show that we are moving on our side, that should spur them to start taking some action on their side. My feeling is that we will need the dataset too. Shapefiles can be directly downloaded from their website for all species that have range maps.

MortenHofft commented 4 years ago

I won't do anything until we have

andrewrodrigues commented 4 years ago

IUCN are happy for us to make mock ups of range maps on species pages but would prefer that we use the API directly in the first instance. Do we have an estimate of how often we will be making calls on their API?

andrewrodrigues commented 4 years ago

After speaking to @ahahn-gbif , can you confirm that we need to host spatial data in order to implement outlier detection? This would help clarify whether we use API or hosted data for both range visualisations and outlier detection.

jhnwllr commented 4 years ago

My intuition about the issue is that we will need to have them stored locally in order to use them productively, since many of the shapes for species can be several MB in size.

MattBlissett commented 4 years ago

By counting hits to our own map tiles for taxa, it's 250-1500 tile requests per minute, or up to 700,000 per day.

But, that's for all taxa. I don't have an idea how many taxa are species with an IUCN rating.

request:(v2 AND taxonkey) -request:(capabilities) -request:(inaturalist) was my query on the logs.

MattBlissett commented 4 years ago

After speaking to @ahahn-gbif , can you confirm that we need to host spatial data in order to implement outlier detection? This would help clarify whether we use API or hosted data for both range visualisations and outlier detection.

Essentially yes.

For is-in-range / outlier detection of occurrences, we need an API that can be given a latitude, longitude, uncertainty and species, and respond with the range(s) its within (if any). That needs to cope with many thousands of requests per second at low latency for use during indexing, and be reliable -- we wouldn't be able to index new or updated occurrence data without it. We also need to know when the underlying data changes, so we can delete our cache of the API.

So far, all the APIs we use for indexing like this are our own.

The result is an annotation on every occurrence, and enables search filters like "show me vulnerable occurrences outside/inside their known range" or "show me occurrences of Aus bus in its breeding range".


For indexing just based on the species, we only need the red list status of each taxon. This is most easily provided by publishing a checklist. ("Aus bus is vulnerable", and/or "Aus bus is invasive in WGSRPD region DNK")

That enables search filters like "show me vulnerable species" (as a species search) or "show me all occurrences of vulnerable species X"


For showing an overlay of ranges on species page maps, we'd like an API that can handle the level of queries above "fast enough" (our map tiles are usually produced within 25ms). We can either use theirs, or host one ourself -- given the data, and future updates to the data.

MortenHofft commented 4 years ago

When talking about API stability and request frequency: we currently get the iucn status from Wikidata. We used to get it from the iucn redlist API, but we got blocked repeatedly, because the traffic was too high. That makes me a bit anxious about using it for maps, since maps typically require more resources than a name lookup.

@andrewrodrigues does "happy for us to make mock ups of range maps" mean that you want a picture for further dialog or is it something that we should look into implementing?