globaldothealth / list

Repository for Global.health: a data science initiative to enable rapid sharing of trusted and open public health data to advance the response to infectious diseases.
MIT License
39 stars 7 forks source link

Data API for powering visualisations like the map #2124

Open rkassa opened 3 years ago

rkassa commented 3 years ago

Current Behavior

There are scheduled "cron jobs" on Data that export country and regional case (and VOC) data as a JSON object. This JSON object is dumped into an S3 bucket and is then used by the Map visualization. There are a few issues here:

New/Desired Behavior

We would have API endpoints for Data instead of running scheduled JSON dumps from the database. These endpoints with parameters would look something like this:

This way, the line list data (with all demographic data) can be usable, with filters, on the Map visualization. It would scale significantly better than continuing to accommodate large JSON data dumps as well. We will need to create more granular issue tickets to scope this out better.

abhidg commented 3 years ago

@rkassa The database is updated twice daily, so actually the JSON dumps can be made less frequent (currently, it runs 4 times a day).

Would the issue of map performance be fixed by adding granular JSON dumps? Then map could request only the ones that are needed.

joe-brilliant commented 3 years ago

Re: #2142 (cc @iamleeg @rkassa @jim-sheldon )

iamleeg commented 3 years ago

@rkassa I'd be very interested in getting this API built out, not only would it improve the maps application (it'd definitely help with showing dev data in dev, prod data in prod, as well as improving the liveness of the data and shrinking the size of the browser download) but also improve the lives of integrators who want to mix our data with other sources.

I think you're right that we need one ticket per API endpoint, then we can take an incremental approach where each use of the JSON files replaced until eventually we don't need to distribute the JSON at all and can turn them off. For each of the endpoints you list it would be great to know:

For whichever one of these we build first there will be cross-cutting concerns that we need to think about to apply to all the others: