ResearchSoftwareInstitute / greendatatranslator

Green Team Data Translator Software Engineering and Development
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

CMAQ Exposures Data - expanded temporal and geospatial reach #123

Closed lstillwe closed 6 years ago

lstillwe commented 6 years ago

Exposures cmaq API has already been expanded to include the entire US for years 2010 & 2011. This issue can also include additional quality metrics to be provided by @arunacs and installed by @lstillwe.

lstillwe commented 6 years ago

@arunacs provided new quality metrics here:

DQ metrics in similar format to previously provided sample for O3 are now available for both years for several gas-phase and aerosol pollutants. /proj/ie/proj/NIH-DataTranslator/for_RENCI/CMAQ/2010/Evaluation/.csv and /proj/ie/proj/NIH-DataTranslator/for_RENCI/CMAQ/2011/Evaluation.csv

@mjstealey warned that we are already at max column width for postgresql table for quality metrics - may need a redesign.

karafecho commented 6 years ago

Current CMAQ data needs:

  1. Sarav to provide expanded ontology/chemical names for union of two years of data. Variables common to both years can be found here: CMAQ_Species_Defn_CheBI_Links.xlsx. PER SARAV/LISA: 99% complete.
  2. Sarav to explore why data points are missing for at least several days in Hao's cleaned CMAQ files. PER SARAV: Took at look at the missing CMAQ data after 2011-07-28 18Z, and seems like it is a data corruption that goes back to the raw data we got from the EPA, and it affects the period 2011-07-28 18Z to 2011-08-01 00Z. The rest of the data seem okay from what I see.
karafecho commented 6 years ago

Hao's 'cleaned' CMAQ output:

File location:

/opt/RENCI/output/

The 2010 data is under the cmaq2010 directory The 2011 data is under the cmaq2011 directory

Hourly files:

C<col>R<row>.csv

Header: start_date, ozone, pm2.5

Daily files:

C<col>R<row>Daily.csv

Header: start_time, o3_avg, pm25_avg, o3_max, pm25_max, o3_min, pm25_min, o3_stddev, pm25_stddev

karafecho commented 6 years ago

A couple of articles that may be of interest to some of you: Reference #1; Reference #2. (I can provide additional references, if desired.)

Caveat: The exposure estimates in these articles are lower than the US EPA AQI breakpoints. This is because the AQI breakpoints provide benchmarks for all persons (children plus adults, healthy and non-healthy), are skewed toward extreme weather events, and are not as granular as we propose for pediatric patients with asthma, who are more sensitive than adults to airborne pollutants. US EPA AQI breakpoints: 24-h average: 0-12, 12.1-35.4, 35.5-55.4, 55.5-150.4, 150.5-250.4, 250.5-350.4, 350.5-500.4 μg/m3. US EPA overall guidelines are maximum PM2.5: 12.0 μg/m3 over 1 year, 35 μg/m3 over 24 hours.

Caveat: We're not as much interested in a case-control study between patients with asthma-like conditions and patients without asthma-like conditions (or "healthy", whatever that means), but rather we're trying to delineate or regroup subpopulations of patients with asthma-like conditions. Short-term (24 hours/one week/two weeks) and long-term (e.g., average annual) exposures are equally informative, but they address somewhat different questions.

karafecho commented 6 years ago

@arunacs , @lstillwe : We failed to discuss the following issue earlier today.

Sarav to provide expanded ontology/chemical names for union of two years of data. Variables common to both years can be found here: CMAQ_Species_Defn_CheBI_Links.xlsx.

arunacs commented 6 years ago

@karafecho See issue #100 where I provided an updated file, and looks like @lstillwe has taken care of it

karafecho commented 6 years ago

Issue 1 is resolved (99% complete).

karafecho commented 6 years ago

As per discussion on 4/11/18, plans are as follows:

  1. Temporal expansion of existing Exposures Service/API - Explore whether we can obtain additional years of data on CMAQ exposures (beyond 2010, 2011) from the US EPA [Sarav]
  2. Spatial expansion of existing Exposures Service/API - Generate higher-resolution CMAQ estimates - one year likely by end of feasibility assessment [Sarav's student]
  3. Pollutant expansion of existing Exposures Service/API - Incorporate additional exposure estimates for 2010 and 2011 (not just PM2.5, ozone) [Lisa]
  4. Data quality metrics for Exposures Service/API - Stick with current metrics for PM2.5 and ozone (2010, 2011), at least for the time being [Lisa]
  5. R-LINE - hold off on this effort, at least for now [Sarav]