Closed rayi113 closed 6 years ago
Good question. Worth a quick discussion on this before implementation.
@arunacs - Added an overview of the CMAQ variables to discuss.
Google: CMAQ - variables all
Format is CSV:
var_name,data_type,long_name,display_name,units,var_desc,notes
Where display_name
and notes
are presently blank and should define what we want to do, or how the exposure variables may interplay (quality metrics, etc.)
The proposal would be pretty straight forward, instead of adding CMAQ data piecemeal as we have been, we just serve it all.
To do so we’d need a better understanding of the variable names and how they interplay with each other. By interplay I mean are some variables best served in combination with each other based on their meaning. For example, if a PM25 request was made, and we also have a PM25 quality metric, we return these pieces of information together.
What would be needed is
display_name
notes
sectionWe can then formulate what it would take to create a dedicated CMAQ RESTful service.
Extend a step further, and the Exposures API would use the CMAQ service for those components.
@arunacs - is this list of variables consistent across the 2010 and 2011 CMAQ datasets? Do the 2010 and 2011 contain the same number of variables and do they all have the same names?
Thanks - Lisa
is this list of variables consistent across the 2010 and 2011 CMAQ datasets?
@lstillwe - Nope, they are different.
Will produce something to show this better.
@lstillwe - Better illustration of 2010 vs 2011:
XYL
and PM25RD
2010 NVARS: 72
2010 Vars: ALD2 ALDX CO ETH ETHA FORM H2O2 HNO3 HNO3_UGM3 HONO HOX IOLE ISOP N2O5 NH3 NH3_UGM3 NHX NO NO2 ANO3_PPB NOY NTR O3 OLE PAR PAN PANX SO2 SO2_UGM3 SULF TERP TOL VOC XYL AFEJ AALJ ASIJ ATIJ ACAJ AMGJ AKJ AMNJ ASOILJ ANAK AMGK AKK ACAK ACLIJ AECIJ ANAIJ ANO3IJ ANO3K ANH4IJ ANH4K AOCIJ AOMIJ AORGAJ AORGBJ AORGCJ APOCIJ APOAIJ ASO4IJ ASO4K ATOTI ATOTJ ATOTK PMIJ PM10 AUNSPEC1IJ ANCOMIJ AUNSPEC2IJ PM25RD
2011 NVARS: 122
2011 Vars: ALD2 ALDX BENZENE CO ETH ETHA FORM H2O2 HNO3 HNO3_UGM3 HONO CLNO2 HOX OH IOLE ISOP N2O5 NH3 NH3_UGM3 NHX NO NO2 ANO3_PPB NTR PANS NOY O3 OLE PAR PAN PANX SO2 SO2_UGM3 SULF TERP TOL VOC XYLMN AFEJ AALJ ASIJ ATIJ ACAJ AMGJ AKJ AMNJ ASOILJ AHPLUSIJ ANAK AMGK AKK ACAK ACLIJ AECIJ ANAIJ ANO3IJ ANO3K TNO3 ANH4IJ ANH4K AOCIJ AOMIJ AORGAJ AORGBJ AORGCJ APOCIJ APOAIJ ASO4IJ ASO4K ATOTI ATOTJ ATOTK PMIJ PM10 AUNSPEC1IJ ANCOMIJ AUNSPEC2IJ AOMOCRAT_PRI AOMOCRAT_TOT PM25_HP PM25_CL PM25_EC PM25_NA PM25_MG PM25_K PM25_CA PM25_NH4 PM25_NO3 PM25_OC PM25_SOIL PM25_SO4 PM25_TOT PM25_UNSPEC1 PMC_CL PMC_NA PMC_NH4 PMC_NO3 PMC_SO4 PMC_TOT DCV_Recon AIR_DENS RH SFC_TMP PBLH SOL_RAD precip WSPD10 WDIR10 K P1 P2 P3 a K_prime sqrt_Ki max_NO3_loss PM25_NO3_loss ANO3IJ_loss PM25_NH4_loss ANH4IJ_loss PMIJ_FRM PM25_FRM
VAR: XYL in 2010, not in 2011
<class 'netCDF4._netCDF4.Variable'>
float32 XYL(TSTEP, LAY, ROW, COL)
long_name: XYL
units: ppbV
var_desc: 1000.0*XYL[1]
unlimited dimensions: TSTEP
current shape = (24, 1, 112, 148)
filling off
VAR: PM25RD in 2010, not in 2011
<class 'netCDF4._netCDF4.Variable'>
float32 PM25RD(TSTEP, LAY, ROW, COL)
long_name: PM25RD
units: ug/m3
var_desc: AECIJ[0]+APOCIJ[0]+0.01*ASO4IJ[0]
unlimited dimensions: TSTEP
current shape = (24, 1, 112, 148)
filling off
Generated with:
from netCDF4 import Dataset
data2010 = 'CMAQ/2010/raw/CCTM_v502_with_CDC2010_Linux2_x86_64intel.ACONC.20100702.combine_base'
data2011 = 'CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_ed_emis_combine.aconc.01'
ds2010 = Dataset(data2010, 'r')
ds2011 = Dataset(data2011, 'r')
keys2010 = ds2010.variables.keys()
keys2011 = ds2011.variables.keys()
print('2010 NVARS: ', getattr(ds2010, 'NVARS'))
print('2010 Vars: ', getattr(ds2010, 'VAR-LIST'))
print('2011 NVARS: ', getattr(ds2011, 'NVARS'))
print('2011 Vars: ', getattr(ds2011, 'VAR-LIST'))
for key in keys2010:
if key not in keys2011:
print('VAR:', key, 'in 2010, not in 2011')
print(ds2010.variables[key])
@mjstealey @arunacs Yes - been looking too: found following list in common between the two. Do we want just these? AALJ ACAJ ACAK ACLIJ AECIJ AFEJ AKJ AKK ALD2 ALDX AMGJ AMGK AMNJ ANAIJ ANAK ANCOMIJ ANH4IJ ANH4K ANO3IJ ANO3K ANO3_PPB AOCIJ AOMIJ AORGAJ AORGBJ AORGCJ APOAIJ APOCIJ ASIJ ASO4IJ ASO4K ASOILJ ATIJ ATOTI ATOTJ ATOTK AUNSPEC1IJ AUNSPEC2IJ CO ETH ETHA FORM H2O2 HNO3 HNO3_UGM3 HONO HOX IOLE ISOP N2O5 NH3 NH3_UGM3 NHX NO NO2 NOY NTR O3 OLE PAN PANX PAR PM10 PMIJ SO2 SO2_UGM3 SULF TERP TOL VOC
Do we want just these?
Would rather get them all. The enforcement of what is available at what time would be done via the API.
XYL
(suspect this is XYLMN
in 2011) and PM25RD
.Full list:
['AALJ', 'ACAJ', 'ACAK', 'ACLIJ', 'AECIJ', 'AFEJ', 'AHPLUSIJ', 'AIR_DENS', 'AKJ', 'AKK', 'ALD2', 'ALDX',
'AMGJ', 'AMGK', 'AMNJ', 'ANAIJ', 'ANAK', 'ANCOMIJ', 'ANH4IJ', 'ANH4IJ_loss', 'ANH4K', 'ANO3IJ',
'ANO3IJ_loss', 'ANO3K', 'ANO3_PPB', 'AOCIJ', 'AOMIJ', 'AOMOCRAT_PRI', 'AOMOCRAT_TOT', 'AORGAJ',
'AORGBJ', 'AORGCJ', 'APOAIJ', 'APOCIJ', 'ASIJ', 'ASO4IJ', 'ASO4K', 'ASOILJ', 'ATIJ', 'ATOTI', 'ATOTJ',
'ATOTK', 'AUNSPEC1IJ', 'AUNSPEC2IJ', 'BENZENE', 'CLNO2', 'CO', 'DCV_Recon', 'ETH', 'ETHA', 'FORM',
'H2O2', 'HNO3', 'HNO3_UGM3', 'HONO', 'HOX', 'IOLE', 'ISOP', 'K', 'K_prime', 'N2O5', 'NH3',
'NH3_UGM3', 'NHX', 'NO', 'NO2', 'NOY', 'NTR', 'O3', 'OH', 'OLE', 'P1', 'P2', 'P3', 'PAN', 'PANS', 'PANX',
'PAR', 'PBLH', 'PM10', 'PM25RD', 'PM25_CA', 'PM25_CL', 'PM25_EC', 'PM25_FRM', 'PM25_HP',
'PM25_K', 'PM25_MG', 'PM25_NA', 'PM25_NH4', 'PM25_NH4_loss', 'PM25_NO3', 'PM25_NO3_loss',
'PM25_OC', 'PM25_SO4', 'PM25_SOIL', 'PM25_TOT', 'PM25_UNSPEC1', 'PMC_CL', 'PMC_NA',
'PMC_NH4', 'PMC_NO3', 'PMC_SO4', 'PMC_TOT', 'PMIJ', 'PMIJ_FRM', 'RH', 'SFC_TMP', 'SO2',
'SO2_UGM3', 'SOL_RAD', 'SULF', 'TERP', 'TNO3', 'TOL', 'VOC', 'WDIR10', 'WSPD10', 'XYL', 'XYLMN', 'a',
'max_NO3_loss', 'precip', 'sqrt_Ki']
@mjstealey Okay - Sounds good - I will just query both 2010 and 2011 datasets for their list of vars and get a union of that, in order to create the db table programmatically.
The 2010 and 2011 were produced somewhat independently, and hence the inconsistency in the species list. The union approach proposed by @lstillwe sounds reasonable for now. When we meet next week, we can explore an option to condense this even further based upon the project needs.
query both 2010 and 2011 datasets for their list of vars and get a union of that, in order to create the db table programmatically
@lstillwe - We were apparently thinking the same thing!
from netCDF4 import Dataset
data2010 = 'CMAQ/2010/raw/CCTM_v502_with_CDC2010_Linux2_x86_64intel.ACONC.20100702.combine_base'
data2011 = 'CMAQ/2011/raw/CCTM_CMAQ_v51_Release_Oct23_NoDust_ed_emis_combine.aconc.01'
ds2010 = Dataset(data2010, 'r')
ds2011 = Dataset(data2011, 'r')
list2010 = str(getattr(ds2010, 'VAR-LIST')).split()
list2011 = str(getattr(ds2011, 'VAR-LIST')).split()
listall = list(set().union(list2010, list2011))
listall.sort()
sql = 'CREATE TABLE IF NOT EXISTS cmaq_exposures_data (\nid SERIAL UNIQUE PRIMARY KEY,\n' \
'col INT,\nrow INT,\nutc_date_time TIMESTAMP'
for item in listall:
sql += ',\n' + str(item) + ' FLOAT'
sql += '\n);'
print(sql)
Additional columns for statistical data can either be added at the same time, or after the fact. Want to get the proper use/definition of each variable from @arunacs first as the data quality ones may not require a full spectrum of aggregate pre-calculation.
Based on brainstorming session with Exposure API team, IE will create Bias and Error statistics for select pollutants, as average of all sites that have observations in entire domain, at hourly resolution for each of 2010 and 2011, for adding to the API
Issue: cmaq-exposure-api/issues/5
@arunacs, @lstillwe - Updated exposure list based on ingest of cmaq data files is here
We want to update the common_name
column with whatever makes sense from a domain terminology perspective. If more than one word or phrase fits, then separate them by ;
Domain-wide hourly Bias and error for O3 from the 2010 and 2011 simulations are available at:
/proj/ie/proj/NIH-DataTranslator/for_RENCI/CMAQ/2010/Evaluation/CMAQ_2010_36k_base_O3_1_timeseries.csv
/proj/ie/proj/NIH-DataTranslator/for_RENCI/CMAQ/2011/Evaluation/CMAQ_2011_12k_O3_1_timeseries.csv
Here is a sample header from 2010, and units are in ppbV.
"Date","CMAQ_2010_36k_base_Obs_Average","CMAQ_2010_36k_base_Model_Average","CMAQ_2010_36k_base_Bias_Average","CMAQ_2010_36k_base_RMSE_Average","CMAQ_2010_36k_base_Cor r_Average" 2010-01-01 00:00:00,15.7729,27.5666,11.7936,18.6757,0.313
@arunacs, @lstillwe - Proposal for integration can be found at RENCI/cmaq-exposure-api/issues/5
Additional data quality (model performance evaluation) metrics will include the following:
VARNAME | COMMON NAME Num_Obs | Number of Paired Observations Obs_mean | Mean Observed Value Mod_mean | Mean Modeled Value Obs_median | Median of Observed Values Mod_median | Median of Modeled Values Coverage | Coverage based upon completeness criteria MB | Mean Bias ME | Mean Error NMB | Normalized Mean Bias NME | Normalized Mean Error NMdnB | Normalized Median Bias NMdnE | Normalized Median Error FB | Fractional Bias FE | Fractional Error COR | Pearson Correlation Coefficient R_Squared | R-Squared Stand_Dev_obs | Standard Deviation of Observed Values Stand_Dev_mod | Standard Deviation of Modeled Values Coeff_of_Var_obs | Coefficient of Variation for Observed Values Coeff_of_Var_mod | Coefficient of Variation for Modeled Values Index_of_Agree | Index of Agreement RMSE | Root Mean Squared Error RMSE_systematic | Systematic Root Mean Squared Error RMSE_unsystematic | Unsystematic Root Mean Squared Error Skew_Obs | Skewness of Observed Values Skew_Mod | Skewness of Modeled Values Median_Diff | Median of differences
Additional bias/error metrics developed for multiple pollutants, and going through final QA before submitting to DT
Sarav, When will the additional bias/error metrics be available? I would like to know when I can schedule this work. Thanks, Lisa
Hi Lisa,
I estimate them to be available by the end of the week.
Sarav
From: Lisa Stillwell notifications@github.com Sent: Wednesday, March 14, 2018 1:45:53 PM To: ResearchSoftwareInstitute/greendatatranslator Cc: Arunachalam, Sarav; Mention Subject: Re: [ResearchSoftwareInstitute/greendatatranslator] Add CMAQ model bias/error info to Exposures API (#93)
Sarav, When will the additional bias/error metrics be available? I would like to know when I can schedule this work. Thanks, Lisa
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/93#issuecomment-373113444, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKC_bowAzhOebEPSBxEiATDz3C8xnOP8ks5teVdRgaJpZM4QLiUX.
DQ metrics in similar format to previously provided sample for O3 are now available for both years for several gas-phase and aerosol pollutants.
/proj/ie/proj/NIH-DataTranslator/for_RENCI/CMAQ/2010/Evaluation/*.csv
and
/proj/ie/proj/NIH-DataTranslator/for_RENCI/CMAQ/2011/Evaluation*.csv
The pollutants include Gas-phase: CO, O3, NO, NO2, NOx, NOy, SO2 Aerosols: PMIJ, ANH4IJ, ASO4IJ, AECIJ, AOCIJ and PM10
Note that some metrics are at a daily while others are at hourly resolution, given the frequency of measurements of the observations network.
Per Oct 2017 Data Quality hackathon breakout group, add CMAQ model bias/error info to Exposures API as an indicator of data quality of CMAQ model.