Open matschaffer opened 4 years ago
cc recent requesters for feedback: @sakshamg94 @shmcminn @julovi @dobrych
Also just noting that this came up in our sync meeting today https://s3-us-west-2.amazonaws.com/safecastdata-us-west-2/meetings/api/2020-08-19-api-sync.mp4 (~26:50 mark) in light of @jamoross mentioning the 10 year anniversary is approaching in ~6months.
Noting that open qa seems to lead with S3+Athena https://docs.openaq.org/
Happy to see I'm not the only one excited about this avenue for cheap data access :)
https://github.com/openaq/oh-snap might help if we want to do a public RDS snapshot (though sounds like no-one is really using the one openaq provides)
https://github.com/openaq/fetches-optimizer/pulls also has info on how they're building their parquet tables which might be useful for us as well.
We get occasional requests for data formatted differently than our bulk exports. For example:
This has also uncovered some lingering data quality issues:
My original thought was to have more people use elasticsearch directly (https://github.com/Safecast/safecastapi/wiki/Data-Sets#kibana--elasticsearch-access).
But the CSV export support is not great. And folks asking for the data seem to be much more familar with postgres/postgis.
I also had hoped to do more with S3 & Athena in this space, but as far as I can tell it has no support for linear distance queries, only cartesian distance (radiation within 100 units would be different meters depending on how far north/south you are).
And finally there was hope that postgres replicas could help us here, but (0) they don't support temp tables (1) they can't be made public and (2) hard queries cause replication lag and ultimately fail out.
Opening this to brainstorm ideas about how we could more easily provide a clean data set in a flexible format people are generally familiar with.
Some ideas: