developmentseed / tipg

Simple and Fast Geospatial OGC Features and Tiles API for PostGIS.
https://developmentseed.org/tipg/
MIT License
153 stars 23 forks source link

approach using background tasks #68

Closed bitner closed 1 year ago

bitner commented 1 year ago

A different approach to updating the Catalog - this approach minimizes any wait in a request by refreshing the catalog as a background task so that it is updated for the next request.

This approach uses middleware to check the TTL setting as well as look for a refresh=true query parameter.

If the refresh=true parameter is set, it will refresh the catalog and then proceed with the request.

If the TTL is expired, it will still use the cached catalog for this request, but it will submit a background task to update the catalog.

ranchodeluxe commented 1 year ago

I'm naive here and might have this wrong -- but there's still a race condition here. While the background task is refreshing the catalog all concurrent requests will fail and 5xx right? I'm over here thinking of ways to make the refresh logic out-of-band with the request/response lifecycle during blue/green deploys (k8s and ECS) 🤔

If I have ☝️ this wrong then isn't a BackgroundTask actually blocking the event loop? And doesn't that mean on large catalogs we'll have requests timing out for this period?

ranchodeluxe commented 1 year ago

I'm naive here and might have this wrong -- but there's still a race condition here. While the background task is refreshing the catalog all concurrent requests will fail and 5xx right? I'm over here thinking of ways to make the refresh logic out-of-band with the request/response lifecycle during blue/green deploys (k8s and ECS) 🤔

If I have ☝️ this wrong then isn't a BackgroundTask actually blocking the event loop? And doesn't that mean on large catalogs we'll have requests timing out for this period?

Given the way we are using BackgroundTask here and based on the source code we are not blocking the main event loop but that does (at least for large catalogs) open us up to my first assumption about race conditions i believe

vincentsarago commented 1 year ago

@ranchodeluxe as long as we are using a static catalog_collection we cannot be 💯 sure that a request will fetch a refreshed database.

If you control the data ingestion, you could always request /refresh to make sure next requests will see the changes but again you could still have users seeing old catalog.

if you have indexes for the geometry and datetime columns, fetching the collections catalog should be relatively quick.

vincentsarago commented 1 year ago

@bitner I merged my PR with yours. are we 👍 to merge this to main ?

I'll start another PR to update the documentation later

ranchodeluxe commented 1 year ago

@ranchodeluxe as long as we are using a static catalog_collection we cannot be 💯 sure that a request will fetch a refreshed database.

That's not the guarantee I'm really looking for. Are you both sure this isn't gonna cause more 5xx's?

Also, what's the opt out option for this?

ranchodeluxe commented 1 year ago

@ranchodeluxe as long as we are using a static catalog_collection we cannot be 💯 sure that a request will fetch a refreshed database.

That's not the guarantee I'm really looking for. Are you both sure this isn't gonna cause more 5xx's?

Also, what's the opt out option for this?

Oh, i see this is already merged 😞

vincentsarago commented 1 year ago

Also, what's the opt out option for this?

TIPG_CATALOG_TTL=0

I think we should set TTL to 0 anyway

vincentsarago commented 1 year ago

Are you both sure this isn't gonna cause more 5xx's?

well if you have multiple worker in your service (uvicorn/gunicorn) you shouldn't have any 5xx! the service should always be able to accept requests 🤷