Safecast / safecastapi

The app that powers api.safecast.org
44 stars 25 forks source link

More flexible data access #726

Open matschaffer opened 4 years ago

matschaffer commented 4 years ago

We get occasional requests for data formatted differently than our bulk exports. For example:

This has also uncovered some lingering data quality issues:

My original thought was to have more people use elasticsearch directly (https://github.com/Safecast/safecastapi/wiki/Data-Sets#kibana--elasticsearch-access).

But the CSV export support is not great. And folks asking for the data seem to be much more familar with postgres/postgis.

I also had hoped to do more with S3 & Athena in this space, but as far as I can tell it has no support for linear distance queries, only cartesian distance (radiation within 100 units would be different meters depending on how far north/south you are).

And finally there was hope that postgres replicas could help us here, but (0) they don't support temp tables (1) they can't be made public and (2) hard queries cause replication lag and ultimately fail out.

Opening this to brainstorm ideas about how we could more easily provide a clean data set in a flexible format people are generally familiar with.

Some ideas:

matschaffer commented 4 years ago

cc recent requesters for feedback: @sakshamg94 @shmcminn @julovi @dobrych

matschaffer commented 4 years ago

Also just noting that this came up in our sync meeting today https://s3-us-west-2.amazonaws.com/safecastdata-us-west-2/meetings/api/2020-08-19-api-sync.mp4 (~26:50 mark) in light of @jamoross mentioning the 10 year anniversary is approaching in ~6months.

matschaffer commented 3 years ago

Noting that open qa seems to lead with S3+Athena https://docs.openaq.org/

Happy to see I'm not the only one excited about this avenue for cheap data access :)

matschaffer commented 3 years ago

https://github.com/openaq/oh-snap might help if we want to do a public RDS snapshot (though sounds like no-one is really using the one openaq provides)

https://github.com/openaq/fetches-optimizer/pulls also has info on how they're building their parquet tables which might be useful for us as well.