CMU-CREATE-Lab / esdr

Environmental Sensor Data Repository (ESDR)
Other
12 stars 9 forks source link

Include feed name, lat, and lon in feed export? #60

Closed chrisbartley closed 3 years ago

chrisbartley commented 3 years ago

When we export data from ESDR, especially in the new multi-export setup, it would be great to have a way to get corresponding lat,lon of each column without having to learn the ESDR api and writing code.

Could we for example insert lat, lon rows in the exported CSV, before the time series measurements? CSV is important since a lot of our users aren't coders, and they're using spreadsheets.

My vote would be to either insert lat, lon, name as three new rows in the data download CSV, or to have an alternate CSV download with the same columns headers, and just the three rows name, lat, lon. That latter thing might be easier from node, I'm not sure?

chrisbartley commented 3 years ago

The latter (alternate download) is definitely easier, and is also what I'd suggest should be done by environmentaldata.org (and I'm 99% sure can be done totally client side). When a user requests a data export on environmentaldata.org, in addition to providing the new single link to download data from multiple feeds, environmentaldata.org could also simply call ESDR's feed metadata API, for example like this:

http://esdr.cmucreatelab.org/api/v1/feeds/?fields=id,name,latitude,longitude&whereOr=id=26,id=28

And get something like this:

{
  "code": 200,
  "status": "success",
  "data": {
    "totalCount": 2,
    "rows": [
      {
        "id": 26,
        "name": "Lawrenceville ACHD",
        "latitude": 40.46542,
        "longitude": -79.960757
      },
      {
        "id": 28,
        "name": "Liberty ACHD",
        "latitude": 40.323768,
        "longitude": -79.868062
      }
    ],
    "offset": 0,
    "limit": 1000
  }
}

And it could then present the above data to the user as CSV or JSON, as text-on-screen or as a click-this-button-to-copy-to-your-clipboard interface, or as a file download.

Your former suggestion, of including the data in the export file itself, it what I'm having a harder time with. Exporting of multiple feeds is live now, so we can work with real examples. Calling this

https://esdr.cmucreatelab.org/api/v1/feeds/export/26.OUT_T_DEGC,28.SO2_PPM?from=1609563600&to=1609576200&format=csv

Gets you this:

EpochTime,3.feed_26.OUT_T_DEGC,3.feed_28.SO2_PPM
1609565400,4.7,0.003
1609569000,6.7,0.004
1609572600,10.5,0.001
1609576200,10.1,0

So what I'm not understanding is where you want the name/lat/long to go? None of the following three options--the only ones I can think of--seem ideal to me. In order from best-to-worst:

Option 1: Some sort of metadata header

Actual format doesn't matter to me at all. The point is that it's some sort of "commented-out" header that we either hope CSV parsers ignore, or that users need to manually tell their parser to ignore:

# Feed 26: Lawrenceville ACHD (40.46542, -79.960757)
# Feed 28: Liberty ACHD (40.323768, -79.868062)
EpochTime,3.feed_26.OUT_T_DEGC,3.feed_28.SO2_PPM
1609565400,4.7,0.003
1609569000,6.7,0.004
1609572600,10.5,0.001
1609576200,10.1,0

Option 2: Essentially two CSVs in one

FeedId,FeedName,Latitude,Longitude
26,Lawrenceville ACHD,40.46542,-79.960757
28,Liberty ACHD,40.323768,-79.868062
EpochTime,3.feed_26.OUT_T_DEGC,3.feed_28.SO2_PPM
1609565400,4.7,0.003
1609569000,6.7,0.004
1609572600,10.5,0.001
1609576200,10.1,0

Option 3: Include as extra fields

Maybe the values only actually appear on line 1 to reduce data size, but that's just an implementation detail. This one is worst partially because of file size bloat, but mostly due to the need for ESDR to parse and modify the datastore's response rather than just piping the output to the browser.

EpochTime,3.feed_26.OUT_T_DEGC,3.feed_28.SO2_PPM,26.name,26.lat,26.lng,28.name,28.lat,28.lng
1609565400,4.7,0.003,Lawrenceville ACHD,40.46542,-79.960757,Liberty ACHD,40.323768,-79.868062
1609569000,6.7,0.004,,,,,,
1609572600,10.5,0.001,,,,,,
1609576200,10.1,0,,,,,,

Summary

There are similar but different issues for JSON output. Regardless, none seems like a perfect solution, when environmentaldata.org could simply provide the info with no further changes to ESDR required.

All that said, I'm not saying no-always-and-forever, I'm just unsure how high to prioritize this when there are other, faster-to-production viable options. And it's not clear to me how users will typically use this. If they're regularly exporting the same(ish) set of feeds, then they only need the metadata once since feed ID, name, lat/long will never change.

chrisbartley commented 3 years ago

Resolution: add a format query string option to the feeds metadata API. Defaults to JSON, but will accept csv (case insensitive) for CSV output of the metadata. Similar in spirit to the format option for export.

chrisbartley commented 3 years ago

Closing here, replacing with issue #61