Safecast / safecastapi

The app that powers api.safecast.org
44 stars 25 forks source link

Provide CSV measurements export with filtering/pagination #676

Closed sasharevzin closed 4 years ago

sasharevzin commented 4 years ago

I found that it exports all measurements into CSV https://github.com/Safecast/safecastapi/blob/master/app/controllers/measurements_controller.rb#L58 Of course, the query always timeout.

matschaffer commented 4 years ago

Csv is kind of important for researchers. Would be good to have some sort of csv data available. Even if it has to be periodic exports to s3 rather than queryable.

sasharevzin commented 4 years ago

If we have somewhere the CSV dump at s3, then we can just redirect to it. Currently, I don't see any export measurements to CSV file so I guess no one is using. Possible?

matschaffer commented 4 years ago

Could be. How did you come across the code?

sasharevzin commented 4 years ago

In a same controller for this issue https://github.com/Safecast/safecastapi/issues/529

sasharevzin commented 4 years ago

@matschaffer @auspicacious. I think it makes sense to remove this option. It just adds pressure to db.

auspicacious commented 4 years ago

Is the daily export listed on https://github.com/Safecast/safecastapi/wiki/Data-Sets in CSV format? If so, then I agree with @sasharevzin

(P.S. the page should document the format)

seanbonner commented 4 years ago

Just to reiterate @matschaffer's comment, the CSV file option is really important to researchers, so if this is redundant because it's already happening somewhere else then it's OK to remove, but if this is what is generating the CSV we publish everyday then it's really very important to keep.

sasharevzin commented 4 years ago

@seanbonner @matschaffer Yes, they generate CSV file: https://github.com/Safecast/safecastapi/blob/master/cron/dump_measurements#L13

matschaffer commented 4 years ago

Okay, after taking a closer look at what you mean, it's specifically https://api.safecast.org/en-US/measurements?format=csv that exists, but doesn't work since it doesn't include any filtering or pagination.

I'll re-word the description to add that.

To be fair, it's broken and no-one has mentioned anything so it probably doesn't see much use, but to @seanbonner 's point CSVs are important so we shouldn't just drop the support, we should try to improve it.

matschaffer commented 4 years ago

Heh, in it's current form it basically just kills the DB, so on second thought I'll open two tickets, one to remove the current support, and another add it back in a way that doesn't try to export everything.

matschaffer commented 4 years ago

Oh, nm. It does have usefulness if filtered:

https://api.safecast.org/en-US/measurements?captured_after=2011-03-10T00%3A00%3A00Z&captured_before=2020-06-15T02%3A02%3A48.492Z&distance=30&format=csv&latitude=40.538928&longitude=141.526812

matschaffer commented 4 years ago

I'm inclined to leave this as is.

It'd be great to have some sort of "slow query" option for generating large csvs asynchronously, but in the mean time I don't think we should just remove what's there.

Folks who know how it can be used, can use it. Folks who don't will get an error. Probably good enough until we have a better story on providing large CSV blobs ad-hoc.

sasharevzin commented 4 years ago

@matschaffer but if we will add pagination to CSV export then everything will be fine. Just saying :)

matschaffer commented 4 years ago

yeah, or some reasonable limit could be worth trying. Though sometimes psql does weird things with limit queries.

sasharevzin commented 4 years ago

@matschaffer Added back filters and pagination

sasharevzin commented 4 years ago

PR closed