Closed DanielAdriaansen closed 1 year ago
@JohnHalleyGotway I wasn't sure what milestone, project, or requestor to select but I did my best. Same with priority.
@DanielAdriaansen I'm looking into this now. A few initial questions.
Is the current way it is done (BUFR) in some other app that is part of MET? If so does it write netCDF? Should I use that (if yes) to figure out how to convert these ascii files to netCDF? (I'm wondering if the BUFR content is equivalent to the ascii content I'm looking at in your examples).
I'm sure I'll have more questions, still have a pretty big learning curve for work in MET.
@JohnHalleyGotway are you aware of how Perry is processing the EPA BUFR data now, or do we need to reach out to him for a sample/example of their current workflow and datasets using BUFR?
I can help answer that. Currently, we use PB2NC to read in the prepbufr and write out a netCDF file to be used in point_stat. We actually do this several ways - directly convert the hourly data to hourly data, but we also need to write out a) the daily max, and b) the daily average. And these maxes and averages need to be summed over a period of 5Z to 5Z. I believe that both/either @JohnHalleyGotway and/or @georgemccabe assisted with that several years ago.
Thanks for the quick response. As I'm pretty green on all of this, an example would be really useful so I could run it in the debugger and figure out how PB2NC does what you want, and how that might be added to ascii2nc. What that would mean is a data example, config file, and command line interface. Maybe start with the hourly data?
I could gather all that up and send that to you after the weekend. The hourly part is pretty straightforward.
The issue that we here at EMC have somewhat been struggling with is if we provided you enough ASCII data from the EPA for you to do this. I am assuming that we have, so I suggest continuing as if we did and then if there are any issues see if we need additional data from the EPA. From what I had seen from the ASCII examples and documentation that we had, I think we sent you everything you need to do this work.
It also has been suggested to us that the DTC is using code from the MONET interface to be able to read this and convert to ASCII. I'm curious if you were aware of that. Perhaps we can lift from that code to assist you in this development?
@davidalbo please take a look at this MONET repo: https://github.com/noaa-oar-arl/monet and ask Barry Baker @bbakernoaa and/or David Fillmore @davidfillmore to see if support for this format already exists there.
@PerryShafran-NOAA - any progress on getting some sample data for us?
@davidalbo please look at https://github.com/noaa-oar-arl/monetio/blob/stable/monetio/obs/airnow.py
@zmoon
Hi,
(1) Both HourlyAQObs and HourlyData are hourly dataset, but HourlyAQObs will not change after its initial posted time. HourlyData will be updated again after its initial post. My assumption is that it will include the additional obs that arrived late. Thus, HourlyData* are prefer hourly dataset to be used for this new apps.
You can see the time stamp changed from this example https://files.airnowtech.org/?prefix=airnow/2022/20220708/
(2) However, Perry found there is no (lat,lon) information in HourlyData . As you can tell from the link above, both hourly files are not small. To pull in both and simply read the site location from HourlyAQObs seems to waster the time for daily fetch of the data from the EPA to the NCO. We later found two location files and wonder whether they are sufficient to provide the sit location of those reported in HourlyData, monitoring_site_locations.dat or Monitoring_Site_Locations_V2.dat or both. That is to use the site id reported in HourlyData to find the (lat,lon) in the two location files described above.
EMC wants to work with the DTC and provide what we know to speed up this development.
There is a monitor_location file of the static lat lons for all available sites
Barry Baker
National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: (301) 683-1395
On Mon, Jul 11, 2022 at 12:58 PM Ho-Chun Huang @.***> wrote:
Hi,
(1) Both HourlyAQObs and HourlyData are hourly dataset, but HourlyAQObs will not change after its initial posted time. HourlyData will be updated again after its initial post. My assumption is that it will include the additional obs that arrived late. Thus, HourlyData* are prefer hourly dataset to be used for this new apps.
You can see the time changed from this example https://files.airnowtech.org/?prefix=airnow/2022/20220708/
(2) However, Perry found there is no (lat,lon) information in HourlyData . As you can tell from the link above, both hourly files are not small. To pull in both and simply read the site location from HourlyAQObs seems to waster the time for daily fetch of they data from the EPA to the NCO. We later found two location files and wonder whether it is sufficient to provide the sit location of those reported in HourlyData, monitoring_site_locations.dat or Monitoring_Site_Locations_V2.dat or both. That is to use the site id reported in HourlyData to find the (lat,lon) in the two location files described above.
EMC wants to work with the DTC and provide what we know to speed up this development.
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1180647355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN6CDK36KFBZUO33ARTVTRHDBANCNFSM5TZOWZZA . You are receiving this because you were mentioned.Message ID: @.***>
Barry:
Can you tell us more specific of the files in your email. Can it be obtained from the same EPA ftp site or else? Will it be updated with time? as we know some old sites were dropped and new sites added all the time.
@Ho-ChunHuang-NOAA It is available in the same FTP directory. The title of the file is Monitoring_Site_Locations_v2.dat
Great, that is one of the location files I found.
Add monitor suite information fact sheet MonitoringSiteFactSheet.pdf
Support for PrepBUFR format of this data ending at the end of 2022, per @PerryShafran-NOAA on 8/22/2022.
Met on 8/26/22 to discuss the development steps:
Documentation:
docs/Users_Guide/reformat_point.rst
to list the newly supported -format airnow
option.Testing:
internal/test_unit/xml/unit_ascii2nc.xml
to exercise the new -format airnow
option.https://dtcenter.ucar.edu/dfiles/code/METplus/MET/MET_unit_test/develop/unit_test/
. Recommend coordinating with @jprestop on the details of that.Code changes in src/tools/other/ascii2nc
aeronet_handler.h
and aeronet_handler.cc
as airnow_handler.h
and airnow_handler.cc
.Makefile.am
for these new source files../bootstrap
to regenerate Makefile.in
from your modified Makefile.am
file.ascii2nc.cc
airnow_handler.h
, add a line to update the revision history at the top, and mimic the existing AeronetHandler
logic throughout.ASCIIFormat_AirNow
entry to the ASCIIFormat
enumeration.set_format()
function so that -format airnow
defines ascii_format = ASCIIFormat_AirNow
.airnow_handler.h
and airnow_handler.cc
to add support for these 2 formats that include lat/lon info:Recommend calling ConcatString::split()
to tokenize the input lines into a StringArray
object.
Consider adding error checking as needed or interpreting bad data flags.
Make sure the expected number of columns are actually present.
See airnow.py for any details about valid data ranges.
For reference, see other sample ascii observation formats here: seneca:/d1/projects/MET/MET_test_data/unit_test/obs_data
Recommend running 2 different commands during development:
plot_point_obs
to plot the location of the point data.
plot_point_obs out/ascii2nc/sample_ascii.nc plot.ps
pntnc2ascii.R
to dump NC obs to ascii:
Rscript scripts/Rscripts/pntnc2ascii.R out/ascii2nc/sample_ascii.nc > sample.txt
That makes it easier to check if the obs are actually being stored in the NC file as you expect.
I don't expect you'll need to modify the default config file for Ascii2NC, but it's located here: data/config/Ascii2NcConfig_default
@JohnHalleyGotway Moving along. Some confusion about putting the data into Observation objects which it seems is the design. Observation might not fit the airnow data? It seems too limited.
The airnow data has columns that are ascii, columns that are status flags, columns with numeric values. The 'station' in our case is kind of a complicated thing based on several different columns: AQSID, SiteName, EPA Region, Country Code, State name, data source. Does all that go into the output?
Also, each line in the hourly data file has multiple columns for each of OZONE, PM10, PM25, NO2, CO, SO2: the AQindex, the value, the units, whether it was measured yes/no. Does all that go into the output?
The daily data is a little simpler, with only one kind of data per line, but still has a bunch of extra columns that maybe don't fit the Observation class as is.
Thoughts/feedback?
Hi, John:
I can not find this topic in discussion issues. This is my suggestion for your consideration. Maybe you can point/demonstrate to me how to join the chat/discussion stream.
(1) I think you are looking at the filename starting with "HourlyAQObs". We prefer to use a filename starting with "HourlyData". It is because the HourlyData may have a later update with new observations while "HourlyAQObs" will not.
(2) The "HourlyData '' files should be similar to the daily files but it does not have the lat/lon information associated with each report (line). You need the AQSID (column 3 ) to lookup the lat/lon from either
monitoring_site_locations.dat or Monitoring_Site_Locations_V2.dat
You also need to sort the chemical obs into each species, e.g., OZCON/A1 and COPOPM/A1. Please consult with Perry whether we still need OZCON/A8, and the variable name for NO2.
(3) You need the date/time information (epa_report_time, column1+column2) from the line (or simply from the filename) to make the valid time, which valid_time = epa_report_time +1hour. This is due to the forward average (EPA) and backward average (model) difference of averaging period.
(4) A unit is needed either in PPB or in PPM or a different unit for each chemical species. The CO may be much more abundant than O3 and NO2 so PPM may be ideal. Traditionally we use PPB for Ozone. But again, it can be adjusted in met-config with a conversion factor. Please consult with Perry whether we include all reported species or gradually add new species in this new ascii2nc. If you want more institutions to use METPlus, I would suggest including all reported species to be more universal.
(5) As described in your email, "HourlyAQObs" has more information reported. Someone may want to use the EPA region or State name as a areal mask for verification like in Melody-MONET. It can be helpful but I do not know how it fits-in the point_static algorithm. For others, SiteName, Country Code, and data source, I do not think they are needed.
Again, this is my personal opinion for your consideration.
Ho-Chun Huang, Ph.D.
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2792
College Park, MD 20740
@. @.>
301-683-3958
On Mon, Aug 29, 2022 at 5:54 PM davidalbo @.***> wrote:
@JohnHalleyGotway https://github.com/JohnHalleyGotway Moving along. Some confusion about putting the data into Observation objects which it seems is the design. Observation might not fit the airnow data? It seems too limited.
The airnow data has columns that are ascii, columns that are status flags, columns with numeric values. The 'station' in our case is kind of a complicated thing based on several different columns: AQSID, SiteName, EPA Region, Country Code, State name, data source. Does all that go into the output?
Also, each line in the hourly data file has multiple columns for each of OZONE, PM10, PM25, NO2, CO, SO2: the AQindex, the value, the units, whether it was measured yes/no. Does all that go into the output?
The daily data is a little simpler, with only one kind of data per line, but still has a bunch of extra columns that maybe don't fit the Observation class as is.
Thoughts/feedback?
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1230902311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPHE3DXMMD5Y4CVEGW3V6DV3UWPZANCNFSM5TZOWZZA . You are receiving this because you were mentioned.Message ID: @.***>
I would disagree that the meta data for sitename and county code is not needed. Although we typically plot averages making this meta data seem redundant, it is nice to plot a single site or filter within a region. What we do in monetio is concatenate the state + county + site code. It is also helpful to filter based on state MSA (metropolitan state area) as this gives us another layer of filtering leading to further understanding of regional factors.
Barry Baker
National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: (301) 683-1395
On Tue, Aug 30, 2022 at 8:30 AM Ho-Chun Huang @.***> wrote:
Hi, John:
I can not find this topic in discussion issues. This is my suggestion for your consideration. Maybe you can point/demonstrate to me how to join the chat/discussion stream.
(1) I think you are looking at the filename starting with "HourlyAQObs". We prefer to use a filename starting with "HourlyData". It is because the HourlyData may have a later update with new observations while "HourlyAQObs" will not.
(2) The "HourlyData '' files should be similar to the daily files but it does not have the lat/lon information associated with each report (line). You need the AQSID (column 3 ) to lookup the lat/lon from either
monitoring_site_locations.dat or Monitoring_Site_Locations_V2.dat
You also need to sort the chemical obs into each species, e.g., OZCON/A1 and COPOPM/A1. Please consult with Perry whether we still need OZCON/A8, and the variable name for NO2.
(3) You need the date/time information (epa_report_time, column1+column2) from the line (or simply from the filename) to make the valid time, which valid_time = epa_report_time +1hour. This is due to the forward average (EPA) and backward average (model) difference of averaging period.
(4) A unit is needed either in PPB or in PPM or a different unit for each chemical species. The CO may be much more abundant than O3 and NO2 so PPM may be ideal. Traditionally we use PPB for Ozone. But again, it can be adjusted in met-config with a conversion factor. Please consult with Perry whether we include all reported species or gradually add new species in this new ascii2nc. If you want more institutions to use METPlus, I would suggest including all reported species to be more universal.
(5) As described in your email, "HourlyAQObs" has more information reported. Someone may want to use the EPA region or State name as a areal mask for verification like in Melody-MONET. It can be helpful but I do not know how it fits-in the point_static algorithm. For others, SiteName, Country Code, and data source, I do not think they are needed.
Again, this is my personal opinion for your consideration.
Ho-Chun Huang, Ph.D.
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2792
College Park, MD 20740
@. @.>
301-683-3958
On Mon, Aug 29, 2022 at 5:54 PM davidalbo @.***> wrote:
@JohnHalleyGotway https://github.com/JohnHalleyGotway Moving along. Some confusion about putting the data into Observation objects which it seems is the design. Observation might not fit the airnow data? It seems too limited.
The airnow data has columns that are ascii, columns that are status flags, columns with numeric values. The 'station' in our case is kind of a complicated thing based on several different columns: AQSID, SiteName, EPA Region, Country Code, State name, data source. Does all that go into the output?
Also, each line in the hourly data file has multiple columns for each of OZONE, PM10, PM25, NO2, CO, SO2: the AQindex, the value, the units, whether it was measured yes/no. Does all that go into the output?
The daily data is a little simpler, with only one kind of data per line, but still has a bunch of extra columns that maybe don't fit the Observation class as is.
Thoughts/feedback?
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1230902311, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ALPHE3DXMMD5Y4CVEGW3V6DV3UWPZANCNFSM5TZOWZZA
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1231600785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN3PTZ3F5DGMYXOY4PDV3X5FXANCNFSM5TZOWZZA . You are receiving this because you were mentioned.Message ID: @.***>
Thanks for the feedback @bbakernoaa @Ho-ChunHuang-NOAA. I'll see what I can do to satisfy your requirements. I decided to go with HourlyAqObs first, because it's a simpler software design without an external lookup. Once I get that working, the HourlyData format can be added. Any time an external lookup is part of the design, we need to be careful to keep the lookup lat/lons correct compared to the AQSID's in the HourlData files. Either could change independent of the other.
As an (unrelated) example of (maybe) something changing, the ascii columns as described here [HourlyAQObsFactSheet.pdf] (https://github.com/dtcenter/MET/files/8514707/HourlyAQObsFactSheet.1.pdf) are in a different order than those in the sample data I am reading. This is revealed by reading the first header line of the ascii data files.
The columns here: DailyDataFactSheet).pdf need to be fixed in the order shown, because there is no header line in the data files, or alternatively the software needs to be very smart.
(Me the engineer thinking about things that could go wrong.)
@PerryShafran-NOAA, I'm working with @davidalbo on this and have a specific question for you. Here's a sample AIRNOW daily format 2 record, as described by this DailyDataFactSheet.pdf:
05/28/19|060410001|San Rafael|OZONE-8HR|PPB|31|8|San Francisco Bay Area AQMD|29|0|37.972200|-
122.518900|840060410001
We can extract up to 3 different "observations" from this line:
The question is whether or not these derived AQI values or categories are of any use to you. Should ascii2nc just dump the raw values or also include the AQI info?
@JohnHalleyGotway Including @Ho-ChunHuang-NOAA on the answer.
The answer for #1 is YES, we need the 8-hr ozone value from this record.
I don't think we need either #2 or #3 from your list, though, but I am checking with Ho-Chun for confirmation.
Now, please confirm that this is the daily file, not the hourly file (we would also need hourly 1-hr ozone and 8-hr ozone records).
Hi,
I agree with Perry that we do not need AQI related variables. NCEP focuses on the verification of the concentration, i.e., " OZONE-8HR value".
Different groups may have different requirements. Maybe Barry will have different opinion.
Ho-Chun Huang, Ph.D.
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2792
College Park, MD 20740
@. @.>
301-683-3958
On Wed, Aug 31, 2022 at 8:14 AM PerryShafran-NOAA @.***> wrote:
@JohnHalleyGotway https://github.com/JohnHalleyGotway Including @Ho-ChunHuang-NOAA https://github.com/Ho-ChunHuang-NOAA on the answer.
The answer for #1 https://github.com/dtcenter/MET/issues/1 is YES, we need the 8-hr ozone value from this record.
I don't think we need either #2 https://github.com/dtcenter/MET/issues/2 or #3 https://github.com/dtcenter/MET/issues/3 from your list, though, but I am checking with Ho-Chun for confirmation.
Now, please confirm that this is the daily file, not the hourly file (we would also need hourly 1-hr ozone and 8-hr ozone records).
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1232856472, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPHE3CR5J2EDHJTFXQLTTDV35EADANCNFSM5TZOWZZA . You are receiving this because you were mentioned.Message ID: @.***>
@JohnHalleyGotway did you say there is an existing bit of code to handle lines like this:
"261250001","OAK PARK","Active","R5","42.4631","-83.183296","390.4","-5","US","MI","03/12/2022","00:00","Michigan Department of Environment, Great Lakes, and Energy", "Detroit","36","","","","1","0","0","0","","","","","","","","","40.0","PPB","",""
The columns are comma separated strings, each string surrounded by double quotes, but the string "Michigan..." has commas in it. Using the code that breaks the line into tokens based on commas doesn't work because of this. I could write my own code to handle it, but..is something out there?
Technical notes related to design choices that have been made or need clarification for adding the airnow support, most can be commented on/resolved quickly by @JohnHalleyGotway
Documentation In the Users Guide, should we link to the pdf files that describe the supported formats? Currently these are found places like this: https://github.com/dtcenter/MET/files/8514707/HourlyAQObsFactSheet.1.pdf
From John: Yes, providing links to specifics is a great idea. But No, please do not link to files posted in the issue. Assuming this is the correct source of the data, you could just link to this site: https://www.epa.gov/outdoor-air-quality-data
Questions about observations output values:
Questions specific to the HourlyAqObs format:
Daily_V2 lookup file
Design choices
@PerryShafran-NOAA @DanielAdriaansen I'm working with multiple Airnow formats, some of which have a column called 'elevation'. The ascii2nc program has two potential output slots for this: 'elevation' (of the sensor) and 'height_m' (of the observation). We can store elevation into one or both of these output variables. What do you recommend. Other Airnow formats do not have elevation, in which case I'm going to set both of the output variables elevation and height_m to missing data -9999.
@Ho-ChunHuang-NOAA Including Ho-Chun who might have more insight into this data.
Thanks!
Perry
@JohnHalleyGotway what about users running the app, not running a unit test. Can they get to MET_BASE/table_files? And is MET_BASE an environment variable?
Dave, yes, MET_BASE is the name we use for the installed location. So these files are available at runtime. For example, on RAL machines, MET version 10.1.0 is installed in /usr/local/met-10.1.0 and the runtime data is found in /usr/local/met-10.1.0/share/met, including the table_files directory.
Hi, @JohnHalleyGotway @davidalbo @DanielAdriaansen,
I understand that this is now available in the METplus 5.0 beta4 version that is now installed on Hera. I'd like to test this new capability, but I just realized that I don't have the use case to allow me to do this. Can you provide some assistance on how I can run the new capability?
Thanks!
Perry
Perry, I see you have questions about HOW to test out this new support for ASCII AIRNOW data.
The purpose of adding this support was to enable you to transition your use case from using the PrepBUFR-based AIRNOW data over to using the ASCII input instead.
If I were in your position, I think I would do some of the following:
Rscript scripts/Rscripts/pntnc2ascii.R ascii2nc_output_file.nc > airnow.txt
to dump those obs to the MET 11-column format and inspect the values to make sure they look good.Hope that helps.
John has covered it. I'd just add you can either run it in a simple way with an input and output file specified:
ascii2nc inputasciifile outputncfile
in which case the program should figure out what format the inputasciifile is, or you can specifiy which airnow format your file is using one of these additional command line options
-format airnowhourly
-format airnowdaily_v2
-format airnowhourlyaqobs
For the airnowhourly format there is an external file used to lookup latitude/longitudes that defaults to a file that is part of the met 5.0 beta 4 package. It is here:
MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt
If you need to change that to some other file, you can do so via an environment variable:
MET_AIRNOW_STATIONS
which you would set to the full path to your file
I did run ascii2nc on hera using a sample ASCII file, and it produced a netcdf file. That was the first step. I learned that it was already configured to run on various AIRNOW ASCII files by just doing an ascii2nc. I suppose it will be relatively simple to translate this to METplus from here on out.
It will probably take a bit to figure out whether the results will be comparable to prepbufr, but the functionality seems to work at least.
Reopening this issue based on the following email feedback from @LoganDawson-NOAA received on 12/6/22:
As you know, one of the MET development issues was adding support for EPA AirNow data in ASCII format. You can see, via the latest comment on the issue, that a beta version of ASCII2NC was used to read the AirNow ASCII data and produce a netcdf output file. However there was no confirmation at the time that the resultant output could then be read into the observation field for use in PointStat.
Additional testing of the EPA AirNow data in the past week has found that the ASCII2NC output has notable differences from the PB2NC output, and thus far, those differences have prevented the ASCII2NC output from being read into point stat. Some of the obstacles are discussed in the forwarded email below, but I've also highlighted the key issues that I've been made aware of in the last week.
1) Incorrect obs_lvl data
PB2NC output includes: obs_lvl = 3600, 3600, 28800, 28800, 3600, 3600, 28800, 28800, 3600, 3600, ... ASCII2NC output includes: obslvl = , , , , , , , , , , , , , , , , , , , , , _, ...
This seems to be preventing observation data from being accessed via the OBS_VAR1_LEVELS = A1 | A8 | A23 settings that has been used for the PB2NC output.
2) ASCII2NC output is missing the "obs_unit" variable. (This may not be directly affecting the ASCII2NC output's use in PointStat, but it's worth sharing to note differences between the ASCII2NC and PB2NC outputs.)
3) Difficulty with matching the fcst and obs valid times. The daily max and average fields (OZMAX1, OZMAX8, and PMTF/A23) require setting some valid time information because of intricacies with the valid time of the forecast information. Setting valid time information for the ASCII observations appears necessary as well. Matching these two settings has not worked yet (see below).
I have tried several options to run the point_stat using ascii2nc airnow netcdf files and all tests failed to produce any statistic. If not using "OBS_VAR1_LEVELS" at all or leaving it as a null assignment "OBS_VAR1_LEVELS=", then error messages occurred.
<<<for example, if "OBS_VAR1_LEVELS" is not used; "ERROR: If FCST_VAR1_LEVELS is set, you must either set OBS_VAR1_LEVELS or change FCST_VAR1_LEVELS to BOTH_VAR1_LEVELS">>>
I have tried OBS_VAR1_LEVELS=A1/L1/L0, and the result are similar, i.e. "Number of matched pairs = 0", but the reasons for observation being rejected are different among verified variables (see below). The matching criteria for each variable are described below or in the logs file (location at the end of email). Please check and see if you want me to try a different criteria.
The reason for observation rejection are; (1) hourly O3; initial reason is "level mismatch" and later is "obs var name" (2) Hourly PM25: initial reason is "obs var name" and later is "level mismatch" (3) Daily 1HR AVG O3: rejected due to "valid time" (4) Daily 8HR AVG O3: rejected due to "obs var name" (5) Daily 24HR AVG PM25 : rejected due to "obs var name"
=== Matching criteria === 1HR AVG O3: FCST_VAR1_NAME = OZCON FCST_VAR1_LEVELS = A1 FCST_VAR1_OPTIONS = set_attr_name = "OZCON1"; OBS_VAR1_NAME= OZONE OBS_VAR1_LEVELS= A1 OBS_VAR1_OPTIONS = message_type = "AIRNOW_HOURLY_AQOBS";
1HR AVG PM25: FCST_VAR1_NAME = PMTF FCST_VAR1_LEVELS = L1 OBS_VAR1_NAME= PM25 OBS_VAR1_LEVELS= A1 OBS_VAR1_OPTIONS = message_type = "AIRNOW_HOURLY_AQOBS";
=== Note the valid_time are different ===== Daily Max 1HR AVG O3: FCST_VAR1_NAME = OZMAX1 FCST_VAR1_LEVELS = L1 FCST_VAR1_OPTIONS = valid_time= "{valid?fmt=%Y%m%d?shift=1d}_04"; OBS_VAR1_NAME= OZONE-1HR OBS_VAR1_LEVELS=A1 OBS_VAR1_OPTIONS = message_type = "AIRNOW_DAILY_V2"; valid_time= "{valid?fmt=%Y%m%d}_000000";
Daily Max 8HR AVG O3: FCST_VAR1_NAME = OZMAX8 FCST_VAR1_LEVELS = L1 FCST_VAR1_OPTIONS = valid_time= "{valid?fmt=%Y%m%d?shift=1d}_11"; OBS_VAR1_NAME= OZONE-8HR OBS_VAR1_LEVELS=A1 OBS_VAR1_OPTIONS = message_type = "AIRNOW_DAILY_V2"; valid_time= "{valid?fmt=%Y%m%d}_000000";
24-HR AVG PM25: FCST_VAR1_NAME = PMTFFCST_VAR1_LEVELS = A23 FCST_VAR1_OPTIONS = valid_time= "{valid?fmt=%Y%m%d?shift=1d}_04"; set_attr_name = "PMAVE"; OBS_VAR1_NAME= PM2.5-24hr OBS_VAR1_LEVELS=A1 OBS_VAR1_OPTIONS = message_type = "AIRNOW_DAILY_V2"; valid_time= "{valid?fmt=%Y%m%d}_000000";
Log file location: /lfs/h2/emc/ptmp/ho-chun.huang/metplus_aq/prod_ascii/20221115/logs /lfs/h2/emc/ptmp/ho-chun.huang/metplus_aqmax/prod_ascii/20221115/logs
Run_scripts: /lfs/h2/emc/ptmp/ho-chun.huang/VERF_script run_aq_prod_ascii_b148_20221115.sh run_aqmax_prod_ascii_b148_20221115.sh
Model output location: /lfs/h2/emc/physics/noscrub/ho-chun.huang/verification/aqm/prod_ascii
ASCII2NC AirNOW netcdf file location: /lfs/h2/emc/vpppg/noscrub/ho-chun.huang/dcom_ascii2nc_airnow
Prepbufr PB2NC netcdf file location: /lfs/h2/emc/physics/noscrub/ho-chun.huang/metplus_aq/aqm/conus_sfc/prod/aq /lfs/h2/emc/physics/noscrub/ho-chun.huang/metplus_aq/aqm/conus_sfc/prod/pm /lfs/h2/emc/physics/noscrub/ho-chun.huang/metplus_aq/aqmmax/aqmax1/prod /lfs/h2/emc/physics/noscrub/ho-chun.huang/metplus_aq/aqmmax/aqmax8/prod /lfs/h2/emc/physics/noscrub/ho-chun.huang/metplus_aq/pmmax/pmave/prod
I have upload the test data and logs here https://www.emc.ncep.noaa.gov/mmb/hchuang/ftp/
@Ho-ChunHuang-NOAA and @LoganDawson-NOAA, thanks for the direction. This commit fixes a bad bug for AIRNOW inputs. All the obs were being reported as OZONE-1HR, but they're now being correctly stored as one of the following: CO-8hr, OZONE-1HR, OZONE-8HR, PM10-24hr, PM2.5-24hr, SO2-24HR
For Daily V2 inputs, it also now extracts the temporal averaging period from column 7 (as described in DailyDataFactSheet.pdf. It converts the hours to seconds and reports them in the obs level slot (i.e. 3600, 28800, or 86400 for 1, 8, or 24-hour intervals). I made this change to match the handling in pb2nc.
The Daily V2 input defines the averaging period but the "HourlyAQObs" (Docs: HourlyAQObsFactSheet.pdf) and "HourlyData" (Docs: HourlyDataFactSheet.pdf) inputs do NOT. So I assume the observations of OZONE, PM2.5, and PM10, for example, in those files are instantaneous measurements for which no averaging time period should be defined. Is that correct? Or should I encode them them all as being a 1-hour (i.e. 3600 seconds) averaging time period?
Hi, John:
Yes, they are 1-HR average (hourly-mean) product, both HourlyAQObs and HourlyData.
Ho-Chun Huang, Ph.D.
Hi, John:
Another issue is the unit. In both HourlyAQObs and HourlyData, CO is in PPM and PM1.5 and PM10 are in UG/M3, and the rest are in PPB. Unless someone familiar with the AirNOW ascii files, they may use it incorrectly.
Is there a way to include the unit info in the ascii2nc generated netcdf files?
Correct PM2.5
@Ho-ChunHuang-NOAA, thanks for the direction. I will:
Once these changes are ready, I'll compile an updated version on Hera for your testing, I'll compile it in: /contrib/met/feature_2142_airnow_take2
And you'll be able to access it by running:
module load intel/2022.1.2
module load anaconda/latest
module use -a /contrib/met/modulefiles/
module load met/feature_2142
Should be ready for your testing tomorrow morning.
@davidalbo I'm wondering how this airnow_monitoring_site_locations_v2.txt table file was created?
I see warnings about 3 missing stations when I run the unit tests in unit_ascii2nc.xml:
WARNING: AirnowHandler::_parseObservationLineStandard() -> Skipping line number 1265 since StationId 010499991 not found in locations file (MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt)! Set the MET_AIRNOW_STATIONS environment variable to define an updated version.
WARNING: AirnowHandler::_parseObservationLineStandard() -> Skipping line number 1280 since StationId 010732059 not found in locations file (MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt)! Set the MET_AIRNOW_STATIONS environment variable to define an updated version.
WARNING: AirnowHandler::_parseObservationLineStandard() -> Skipping line number 8496 since StationId 124000041601 not found in locations file (MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt)! Set the MET_AIRNOW_STATIONS environment variable to define an updated version.
I see AirNow site-specific info at this URL. And I location information for the 1st and 3rd stations listed above, but not the second. Did you get this static file from somewhere or construct it yourself from those .json site files? I'd like to do 2 things:
@Ho-ChunHuang-NOAA, please test this out on Hera using this version of MET:
module load intel/2022.1.2
module load anaconda/latest
module use -a /contrib/met/modulefiles/
module load met/feature_2142
Note that:
Please pay careful attention to the message type and variable names present in these files. You'll need to use this information to setup the METplus conf files for Point-Stat.
These names are just passed from the input to the output:
AIRNOW_DAILY_V2
CO-8hr
, OZONE-1HR
, OZONE-8HR
, PM10-24hr
, PM2.5-24hr
, SO2-24HR
A1
, A8
, or A24
)AIRNOW_HOURLY
BARPR
, BC
, CO
, NH3
, NO
, NO2
, NO2Y
, NOX
, NOY
, OZONE
, PM10
, PM2.5
, PMC
, PRECIP
, RHUM
, RWD
, RWS
, SO2
, SRAD
, TEMP
, UV-AETH
, WD
, WS
A1
)AIRNOW_HOURLY_AQOBS
CO
, NO2
, OZONE
, PM10
, PM25
, SO2
A1
)John:
Thanks.
Regarding the site info. Daily reporting stations may be different. I expect it can be found in daily reported monitoring_site_locations.dat or Monitoring_Site_Locations_V2.dat.
Which date are you looking at.for the 3 missing station info?
Ho-Chun Huang, Ph.D.
Physical Scientist III, Contractor with Lynker in Support of
NOAA/NWS/NCEP/EMC, U.S. Department of Commerce
5830 University Research Ct., Rm. 2792
College Park, MD 20740
@. @.>
301-683-3958
On Wed, Dec 7, 2022 at 6:26 PM johnhg @.***> wrote:
@Ho-ChunHuang-NOAA https://github.com/Ho-ChunHuang-NOAA, please test this out on Hera using this version of MET:
module load intel/2022.1.2 module load anaconda/latest module use -a /contrib/met/modulefiles/ module load met/feature_2142
Note that:
- The observation variable names are now correct.
- The units are also now included.
- The accumulation intervals (1, 6, or 24 hours) are included in seconds.
- The observation heights are simply written as bad data.
Please pay careful attention to the message type and variable names present in these files. You'll need to use this information to setup the METplus conf files for Point-Stat.
These names are just passed from the input to the output:
- Daily V2 files:
- message type = AIRNOW_DAILY_V2
- variable names = CO-8hr, OZONE-1HR, OZONE-8HR, PM10-24hr, PM2.5-24hr, SO2-24HR
- accumulations = 1, 8, or 24-hour (i.e. A1, A8, or A24)
- Hourly files:
- message type = AIRNOW_HOURLY
- variable names = BARPR, BC, CO, NH3, NO, NO2, NO2Y, NOX, NOY, OZONE, PM10, PM2.5, PMC, PRECIP, RHUM, RWD, RWS, SO2, SRAD, TEMP, UV-AETH, WD, WS
- accumulations = 1-hour (i.e. A1)
- Hourly AQ files:
- message type = AIRNOW_HOURLY_AQOBS
- variable names = CO, NO2, OZONE, PM10, PM25, SO2
- accumulations = 1-hour (i.e. A1)
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1341735896, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPHE3BY3T7QATBL4YQ6JELWMEMKHANCNFSM5TZOWZZA . You are receiving this because you were mentioned.Message ID: @.***>
@davidalbo I'm wondering how this airnow_monitoring_site_locations_v2.txt table file was created?
I see warnings about 3 missing stations when I run the unit tests in unit_ascii2nc.xml:
WARNING: AirnowHandler::_parseObservationLineStandard() -> Skipping line number 1265 since StationId 010499991 not found in locations file (MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt)! Set the MET_AIRNOW_STATIONS environment variable to define an updated version. WARNING: AirnowHandler::_parseObservationLineStandard() -> Skipping line number 1280 since StationId 010732059 not found in locations file (MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt)! Set the MET_AIRNOW_STATIONS environment variable to define an updated version. WARNING: AirnowHandler::_parseObservationLineStandard() -> Skipping line number 8496 since StationId 124000041601 not found in locations file (MET_BASE/table_files/airnow_monitoring_site_locations_v2.txt)! Set the MET_AIRNOW_STATIONS environment variable to define an updated version.
I see AirNow site-specific info at this URL. And I location information for the 1st and 3rd stations listed above, but not the second. Did you get this static file from somewhere or construct it yourself from those .json site files? I'd like to do 2 things:
- Make sure that site information is as correct as possible.
- Update this section of the documentation to tell users where to look for updated site location info in the future.
@JohnHalleyGotway I pulled down the following single ascii file and made it the default file that we have within MET.
https://s3-us-west-1.amazonaws.com//files.airnowtech.org/airnow/today/Monitoring_Site_Locations_V2.dat
with documentation about the format here: https://github.com/dtcenter/MET/files/9086344/MonitoringSiteFactSheet.pdf
which I used to create software to read it and look up locations for particular stations. The files you refer to are a different starting point, so something would need to be done to deal with multiple files and maybe a different format.
I don't know which source is better.
John
Please let me know the date that you processed and found the 3 missing station info. I would like to confirm they can be found in daily monitoring_site_locations.dat an/or Monitoring_Site_Locations_V2.dat.
I can not find the three station from location files of 2022/11/15.
Ho-Chun,
The sample data we're using that results in those warnings can be found here:
https://dtcenter.ucar.edu/dfiles/code/METplus/MET/MET_unit_test/develop/unit_test/obs_data/airnow/
Looks like the data is from March 12,2022. I'd like to add more info to the users guide to tell users where/how to get updated site definition files as needed in the future.
John:
Back in 20220312, I did not get location files, monitoring_site_locations.dat and Monitoring_Site_Locations_V2.dat. So I do not know how does this airnow_monitoring_site_locations_v2.txt table file was created.
I went back to the AirNOW website and download monitoring_site_locations.dat and Monitoring_Site_Locations_V2.dat. They can be found in https://www.emc.ncep.noaa.gov/mmb/hchuang/ftp.
I can find obs station info of sid=010732059 and 124000041601 in monitoring_site_locations.dat and Monitoring_Site_Locations_V2.dat.
But I can not find the information of sid=010499991 in monitoring_site_locations.dat and Monitoring_Site_Locations_V2.dat.
The things get very tricky because sid=010499991 can be found in HourlyData, daily_data.dat, and daily_data_v2.dat but not in HourlyAQObs.
Noted that monitoring_site_locations.dat and Monitoring_Site_Locations_V2.dat is dynamic not static (may change day by day)
This is my suggestion for your consideration; In general, all station information should be read from either monitoring_site_locations.dat or Monitoring_Site_Locations_V2.dat of the same day. If it can not be found, then remove the obs from the output.
I have sent a tar file that EMC ingested daily, i.e., 20221203.tar, in the same ftp directory. Can you use it for a new sample dataset?
Alternative solution, additional step can be added if aforementioned method failed, is If station can not be found in monitoring_site_locations.dat and Monitoring_Site_Locations_V2.dat but can be found in HourlyAQObs* or daily_data_v2.dat. You can still retrieve the lat lon information from the same record read e.g.,
daily_data_v2.dat:03/12/22|010499991|Sand Mountain|OZONE-8HR|PPB|42|8|EPA Office of Atmospheric Programs|39|0|34.289001|-85.970065|840010499991
HourlyAQObs_2022031223.dat:"010732059","Arkadelphia","Active","R4","33.521422","-86.844077","0.0","-6","US","AL","03/12/2022","23:00","Jefferson County Department of Health","","","","","","0","0","0","0","","","","","","","","","","","",""
If sid obs can only be found in HourlyData* and daily_data.dat, then remove it from the list.
I may not explain thing clearly, please let me know and maybe we can clear it through google meet.
@Ho-ChunHuang-NOAA thanks for this information! The ascii2nc tool supports 3 input types related to AirNow:
airnowdaily_v2
contains lat/lon infoairnowhourlyaqobs
contains lat/lon infoairnowhourly
DOES NOT contain lat/lon infoSo fortunately, it's only 1 of the 3 data sources to which this table lookup complication applies. Please read through this updated section in the MET User's Guide about the MET_AIRNOW_STATIONS environment variable:
Are there any changes you'd recommend to that information? And have you been able to test ascii2nc/point-stat to confirm that the AirNow vx works as you'd expect?
There are formatting diffs between the 2 files: Monitoring_Site_Locations_V2.dat
vs monitoring_site_locations.dat
It handles the first one, not the second. If we need to update that logic or allow MET_AIRNOW_STATIONS to be set to a list of multiple filenames, we can add that to a future release based on your input.
We're creating the MET-11.0.0 release today. So we are out of time.
Hi, John:
I will read it and let you know if I have any comments. I have not started the testing yet. In addition to other assignments, I am waiting for you to resolve the location issue. I do think at least it should be changed to use the current-day location file.
The static table you previously used will lead to future problems as some stations may be decommissioned and some can be put online.
I will use what you have currently on the Hera (module load met/feature_2142) and repeat the test.
Ho-Chun Huang, Ph.D.
Hi, John:
I read it and the new feature provide the option using current day location files. Just to be sure, is this is what I should add to my script?
If I process /lfs/h2/emc/physics/noscrub/ho-chun.huang/epa_airnow_acsii/2022/20221201/HourlyAQObs_2022120119.dat or /lfs/h2/emc/physics/noscrub/ho-chun.huang/epa_airnow_acsii/2022/20221201/daily_data_v2.dat
I set
export MET_AIRNOW_STATIONS= /lfs/h2/emc/physics/noscrub/ho-chun.huang/epa_airnow_acsii/2022/20221201/Monitoring_Site_Locations_V2.dat
Is this correct?
Ho-Chun Huang, Ph.D.
Physical Scientist III, Contractor with Lynker in Support of
NOAA/NWS/NCEP/EMC, U.S. Department of Commerce
5830 University Research Ct., Rm. 2792
College Park, MD 20740
@. @.>
301-683-3958
On Fri, Dec 9, 2022 at 1:54 PM Ho-Chun Huang - NOAA Affiliate < @.***> wrote:
Hi, John:
I will read it and let you know if I have any comments. I have not started the testing yet. In addition to other assignments, I am waiting for you to resolve the location issue. I do think at least it should be changed to use the current-day location file.
The static table you previously used will lead to future problems as some stations may be decommissioned and some can be put online.
I will use what you have currently on the Hera (module load met/feature_2142) and repeat the test.
Ho-Chun Huang, Ph.D.
Physical Scientist III, Contractor with Lynker in Support of
NOAA/NWS/NCEP/EMC, U.S. Department of Commerce
5830 University Research Ct., Rm. 2792
College Park, MD 20740
@. @.>
301-683-3958
On Fri, Dec 9, 2022 at 1:18 PM johnhg @.***> wrote:
@Ho-ChunHuang-NOAA https://github.com/Ho-ChunHuang-NOAA thanks for this information! The ascii2nc tool supports 3 input types related to AirNow:
- airnowdaily_v2 contains lat/lon info
- airnowhourlyaqobs contains lat/lon info
- airnowhourly DOES NOT contain lat/lon info
So fortunately, it's only 1 of the 3 data sources to which this table lookup complication applies. Please read through this updated section in the MET User's Guide about the MET_AIRNOW_STATIONS environment variable:
Are there any changes you'd recommend to that information? And have you been able to test ascii2nc/point-stat to confirm that the AirNow vx works as you'd expect?
— Reply to this email directly, view it on GitHub https://github.com/dtcenter/MET/issues/2142#issuecomment-1344618430, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPHE3EJ4EAZXNUKIRH76HLWMNZXFANCNFSM5TZOWZZA . You are receiving this because you were mentioned.Message ID: @.***>
Describe the New Feature
Per dtcenter/METplus#1515 and @PerryShafran-NOAA, the EPA will switch from providing BUFR to providing ASCII data to NOAA. The new feature is to add support for this new dataset to ASCII2NC.
Acceptance Testing
Sample files exist on seneca here: /home/dadriaan/projects/airnow/shafran_data/
From dtcenter/METplus#1515:
There are four file types: 1) "HourlyAQObs" (Docs: HourlyAQObsFactSheet.pdf) 2) "HourlyData" (Docs: HourlyDataFactSheet.pdf) 3) "daily_data" (Docs: unknown) 4) "daily_data_v2" (Docs: DailyDataFactSheet.pdf)
The documentation for file type 4) only describes "daily_data_v2", and no information was provided about the "daily_data" file from the user. @JohnHalleyGotway notes:
Time Estimate
1 day of work
Sub-Issues
Relevant Deadlines
NONE.
Funding Source
2792541
Define Related Issue(s)
Consider the impact to the other METplus components.
New Feature Checklist
See the METplus Workflow for details.
feature_<Issue Number>_<Description>
feature <Issue Number> <Description>