Open rschwant opened 1 year ago
filter to just include only monitor data or only sensor data
Not sure how to do this with the OpenAQ data source we currently use for MONETIO. @bbakernoaa @rschwant ?
Can we not just do df[[columns_to_keep]]?
Can we not just do df[[columns_to_keep]]?
Sure but I don't know how to determine which to keep. That is, I don't see a column that tells us whether the measurement comes from a "monitor" or just a normal sensor. There is the "sourceType" column, but that seems to be always just set to "government".
I guess the original JSON files do have more info that we aren't currently propagating through in our processing [^1]. Do you know which we need? Example entry:
{"date":{"utc":"2019-07-31T16:00:00.000Z","local":"2019-07-31T22:00:00+06:00"},
"parameter":"pm25","value":5,"unit":"µg/m³",
"averagingPeriod":{"value":1,"unit":"hours"},
"location":"US Diplomatic Post: Astana",
"city":"Astana","country":"KZ",
"coordinates":{"latitude":51.125286,"longitude":71.46722},
"attribution":[{"name":"EPA AirNow DOS","url":"http://airnow.gov/index.cfm?action=airnow.global_summary"}],
"sourceName":"StateAir_Astana",
"sourceType":"government",
"mobile":false}
[^1]: Currently monetio returns a df with ['time', 'latitude', 'longitude', 'sourceName', 'sourceType', 'city', 'country', 'utcoffset', 'bc_umg3', 'co_ppm', 'no2_ppm', 'o3_ppm', 'pm10_ugm3', 'pm25_ugm3', 'so2_ppm', 'siteid', 'time_local']
Update: as noted in last dev meeting, our current OpenAQ reader in MONETIO only fetches OpenAQ v1 data, which doesn't include the low-cost sensors. But I am working on an OpenAQ v2 reader, which does.
Zach is working on adding the latest version of OpenAQ to MONETIO. Once this is complete, let's add a converter for OpenAQ data into MELODIES MONET too. Zach this could just be part of your CLI tool. Barry mentioned we should be able to use the same format for the AirNow files, so it should be pretty seamless to pull into the tool.
Then as part of this, let's create an example where we show users how to use OpenAQ data. Specifically how to use all the data and how to filter to just include only monitor data or only sensor data. Then let's also include a description of this on the ReadTheDocs page so that people understand that not all data from OpenAQ is monitor data, but there is an easy method to filter to select the appropriate measurement technique based on your science question.