NASA-IMPACT / covid-api

MIT License
14 stars 3 forks source link

Investigate enabling date range queries for the `/timelapse` endpoint #127

Closed leothomas closed 3 years ago

leothomas commented 3 years ago

This issue builds off of PR #113

The /timelapse endpoint returns zonal stats for a single time unit (month or day), which can be cumbersome for API consumers (eg: 2 year timelapse of a monthly dataset requires 24 API queries).

While it's very straightforward to implement a date_range parameter in the query which causes the API to return an array of zonal stats. ie:

# current behaviour: 
GET /timelapse -d {"date": "YYYY_MM_DD", ... } --> {"mean": ..., "median" } 

# additional behaviour to implement: 
GET /timelapse -d {"date_range": ["YYYY_MM_DD_start", "YYYY_MM_DD_end"], ... } --> {
    "YYYY_MM_DD-1": {"mean": ..., "median": ... }
    "YYYY_MM_DD-2": {"mean": ..., "median": ... }
    ...
}

NOTE: calculating zonal stats requires loading the COG into memory. For a big enough AOI this calculation can already be quite time consuming, repeating this calculation in a loop for a large time range (eg: daily dataset for 2 years is over 700 zonal_stats calculations) might break things, or at best be very slow.

TODO:

leothomas commented 3 years ago

Update:

I've implemented a threaded stats calculation that processes the requested date range in batches of 15. I ran some tests to get an idea of run time for this endpoint with different date ranges and areas. Here are the results:

Dataset: CO2 (daily COGs) Los Angles (106 994 km^2) California (999 714 km^2) Continental U.S. (14 376 697 km^2)
2 days 0.6 s 0.7 s 0.8 s
1 week 0.8 s 1.1 s 1.5 s
1 month 2.4 s 2.9 s 4.9 s
6 months 11.6 s 13.9 s 26.5 s
1 year (~330 COGs to load) 21.8 s 27.0 s breaks API 30s timeout

I've deployed this work into a test stack accessible at this endpoint: https://l47o73bjpk.execute-api.us-east-1.amazonaws.com/v1/timelapse