AtlasOfLivingAustralia / biocache-service

Occurrence & mapping webservices
https://biocache-ws.ala.org.au/ws/
Other
9 stars 26 forks source link

Prevent large traffic spikes hitting biocache origin #910

Open joe-lipson opened 1 month ago

joe-lipson commented 1 month ago

Species pages on australian.museum include an embedded interactive species distribution map that comes from biocache. We've seen on occasion these pages generate over 100,000 requests to biocache in under 30 minutes, sometimes from only a small number of hosts. We actually seem to handle it pretty well, but it is a lot of requests to service and significantly higher than our baseline.

Example museum page: https://australian.museum/learn/animals/mammals/bare-nosed-wombat/

It's easy to generate large numbers of requests by zooming and panning the map. they are all of the form: https://biocache.ala.org.au/ws/ogc/wms/reflect?BBOX=17112110.396258976,-3355891.289832378,17121894.33587948,-3346107.3502118755&q=lsid%3Aurn%3Alsid%3Abiodiversity.org.au%3Aafd.taxon%3A66d42847-c556-4fa3-902c-a91d9f517286&SERVICE=WMS&REQUEST=GetMap&VERSION=1.1.1&SRS=EPSG%3A3857&ATTRIBUTION=Atlas+of+Living+Australia&FORMAT=image%2Fpng&BGCOLOR=0x000000&TRANSPARENT=true&ENV=color%3Ae6704c%3Bname%3Acircle%3Bsize%3A4%3Bopacity%3A0.8&OUTLINE=false&WIDTH=256&HEIGHT=256

And return a PNG map tile.

The response headers are aggressive about never caching a response

cache-control: no-cache, no-store, max-age=0, must-revalidate
pragma: no-cache

I'm not sure if there's a reason behind these settings or if it's a default. We could reduce the load on our infrastructure and get better performance for users if we allowed caching of the tiles, we'd also save on traffic and serving costs. It seems reasonable to cache for at least a day. Browsers and any other intermediate caches can then keep a copy. I'll put in another ticket to get CloudFront in front of Biocache which will help with this and other cachable queries.

https://github.com/AtlasOfLivingAustralia/ala-infrastructure/issues/1197

adam-collins commented 1 month ago

Some info:

Known issues with caching that would need resolution. For a current example, the hubs data quality caching is occasionally reported as it produces inconsistent responses. These are more a problem for long term (1hr) client caching than proxy caching that can be cleared on a trigger.

Targeted proxy cache is probably the most appropriate. I expect the triggered cache clearing will be the most complex component.