devinit / datahub

Datahub v2
http://data.devinit.org
15 stars 3 forks source link

Determine dev needs for Datahub API #487

Closed akmiller01 closed 5 years ago

akmiller01 commented 5 years ago

With most of the user needs for the API incorporated, we need to take a look at a few dev/technical needs before we go live with it.

  1. Is it secure? Will it allow access to our DB in ways that could be malicious?
  2. Is it robust? Will repeated use/high-traffic lead to too much strain on our server?
edwinmp commented 5 years ago

@akmiller01 it's also prudent to plan for future changes, some major ones that may be breaking. For that, we should version the API.

Also think some of these tables contain quite a lot of data, so the default (providing just the table & response format) should return few rows, say 100. We can add some minor metadata to that results e.g. http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL http://api.worldbank.org/v2/countries/all/indicators/SP.POP.TOTL?format=json

Yeah, taking a leaf from the World Bank API... or several

edwinmp commented 5 years ago

@akmiller01 For load testing:- https://loader.io looks good.

edwinmp commented 5 years ago

@akmiller01 done some load tests on loader.io and it seems the API can handle on average 30 client requests per sec.

edwinmp commented 5 years ago

@akmiller01 also noticed that unlike the DDH API which weeds out data with null values, this one doesn't. Is that something we should continue with, or should we exclude such filters and let the users decide? @Duncan-Knox

Duncan-Knox commented 5 years ago

Does the World Bank one include all data entries (within their country/year template) which are null? From my perspective it sounds fine for our one to exclude this filter too and/or let the users decide @edwinmp. My issue around null and 0 was understanding how they were being treated and making sure we had the option of showing the latter for certain visuals on the front-end.

On this point I'd be interested to know which data download the user would end up getting, would it be the original data series straight from the warehouse or data which is tied to the front-end visual in some way (for example a download from the global picture page would have entries for all countries)?

k8hughes commented 5 years ago

@edwinmp is it the case that the use is getting single indicator data depending on what indicator is currently highlighted on the global picture when they click to download the data?

Could you review Duncan's question above as well? :)

edwinmp commented 5 years ago

@k8hughes yes, the downloads on the maps are per indicator. @Duncan-Knox it is pulled straight from the warehouse but only as it relates to a particular visual. As far as I can tell, there's rarely any joins, so we're pulling the original data series. For example, on the Global Picture and Spotlights, each indicator corresponds to a specific warehouse table.

akmiller01 commented 5 years ago

@edwinmp the only join performed by the API is joining entity names to entity ids!

akmiller01 commented 5 years ago

Testing loads of the CSV exporter now @edwinmp , I think it's doing much better now:

image

akmiller01 commented 5 years ago

Multitable has a bit more latency, but otherwise not bad:

image

Are we satisfied with this enough to not worry about caching @edwinmp ? Caching would introduce some headaches when it comes time for the data to update.

edwinmp commented 5 years ago

@akmiller01 I think it's more than adequate. We'll have to keep track of it as the data grows but for now this should be fine.