enram / data-repository

Data quality assessment
https://enram.github.io/data-repository/
MIT License
3 stars 1 forks source link

Offer a webservice for data availability #9

Closed peterdesmet closed 7 years ago

peterdesmet commented 8 years ago

Do we really need to build this? And if so, how do we keep it simple?

We mainly need this for the web app to browse the data availability (#12), but it's probably also useful if one can query this via the command line. I can picture how the REST API would look like, but I have no idea what backend is required? E.g. do we need a database with this kind of information or can a script just run over the file structure to retrieve that information on the fly?

REST API

The REST API could basically mimic the file structure (see #2):

Request for one month: http://lifewatch.eu/enram/api/v1/coverage/yyyy/mm/

Response (paginated for large result sets):

[
"2016-02-01": [6451, 6477, 6410],
"2016-02-02": [6451, 6477, 6410],
"2016-02-03": [6451, 6477],
"2016-02-04": [6451, 6477, 6410],
...
"2016-02-29": [6451, 6477, 6410]
]
peterdesmet commented 8 years ago

We could also expose this information as one big JSON file (so no filter on year, month or day). The app #12 would then just download that file once in memory and any filtering can happen locally. Rather than querying the file structure on the fly, the file would be periodically rebuild (see #1).

adokter commented 8 years ago

Nice to have, but not top priority in my opinion. Also, the presence of a file will say little about whether the data contains information that is useful and/or can be trusted, as we will be in an expanding phase with a lot of testing and development the coming years. Therefore a static website with general information on which radars can be trusted and which not (yet) may be a more urgent need

peterdesmet commented 8 years ago

Wouldn't it be helpful to still see the coverage of the data, also marking those radars that are "trusted" vs "not trusted"?

adokter commented 8 years ago

Yes would be helpful

peterdesmet commented 7 years ago

A first step towards this has been taken, with this coverage file: https://lw-enram.s3-eu-west-1.amazonaws.com/coverage.csv (which powers the calendar-heatmap). It is generated each time data is added.

We've had positive feedback on this type of visualization, but as you mention @adokter, the presence of a data file might not say a lot, so this can definitely be improved upon.