Closed sasharevzin closed 4 years ago
Csv is kind of important for researchers. Would be good to have some sort of csv data available. Even if it has to be periodic exports to s3 rather than queryable.
If we have somewhere the CSV dump at s3, then we can just redirect to it. Currently, I don't see any export measurements to CSV file so I guess no one is using. Possible?
Could be. How did you come across the code?
In a same controller for this issue https://github.com/Safecast/safecastapi/issues/529
@matschaffer @auspicacious. I think it makes sense to remove this option. It just adds pressure to db.
Is the daily export listed on https://github.com/Safecast/safecastapi/wiki/Data-Sets in CSV format? If so, then I agree with @sasharevzin
(P.S. the page should document the format)
Just to reiterate @matschaffer's comment, the CSV file option is really important to researchers, so if this is redundant because it's already happening somewhere else then it's OK to remove, but if this is what is generating the CSV we publish everyday then it's really very important to keep.
@seanbonner @matschaffer Yes, they generate CSV file: https://github.com/Safecast/safecastapi/blob/master/cron/dump_measurements#L13
Okay, after taking a closer look at what you mean, it's specifically https://api.safecast.org/en-US/measurements?format=csv
that exists, but doesn't work since it doesn't include any filtering or pagination.
I'll re-word the description to add that.
To be fair, it's broken and no-one has mentioned anything so it probably doesn't see much use, but to @seanbonner 's point CSVs are important so we shouldn't just drop the support, we should try to improve it.
Heh, in it's current form it basically just kills the DB, so on second thought I'll open two tickets, one to remove the current support, and another add it back in a way that doesn't try to export everything.
I'm inclined to leave this as is.
It'd be great to have some sort of "slow query" option for generating large csvs asynchronously, but in the mean time I don't think we should just remove what's there.
Folks who know how it can be used, can use it. Folks who don't will get an error. Probably good enough until we have a better story on providing large CSV blobs ad-hoc.
@matschaffer but if we will add pagination to CSV export then everything will be fine. Just saying :)
yeah, or some reasonable limit could be worth trying. Though sometimes psql does weird things with limit queries.
@matschaffer Added back filters and pagination
PR closed
I found that it exports all measurements into CSV https://github.com/Safecast/safecastapi/blob/master/app/controllers/measurements_controller.rb#L58 Of course, the query always timeout.