NOAA-EMC / GSI

Gridpoint Statistical Interpolation
GNU Lesser General Public License v3.0
66 stars 150 forks source link

ConMon -- add NetCDF Support #7

Closed EdwardSafford-NOAA closed 3 years ago

EdwardSafford-NOAA commented 4 years ago

Modify the Conventional data monitor (ConMon): change the package level build script to use the cmake utility and the $target specific module files. utilize the latest version of read_diag.f90 to enable support for NetCDF formatted cnvstat diagnostic files.

This issue began as VLab #63754. Development will continue in repository EdwardSafford-NOAA/GSI in branch esafford_ConMon_63754.

EdwardSafford-NOAA commented 4 years ago

Note that the original description of "utilize latest version of read_diag.f90..." is in error. That file in the GSI source code provides NetCDF support only for radiance diagnostic files. This ticket will include writing a conmon_read_diag.F90 module to enable reading conventional diag files. That will be implemented by data type (ps, q, t, uv, gpsro).

EdwardSafford-NOAA commented 4 years ago

Complication:

I believe I've found another missing data element in the NetCDF formatted in the uv diag files. I'm using diag files from v16rt2 and when I do

dump -h diag_conv_uv_ges.2020051806.nc4

The variable list does not include Nonlinear_QC_Var_Jb. This variable is found in diag files for ps, t, and q types.

Further in the gsi code in setupw.f90 I find these lines (with line numbers):

Line 1633 is within subroutine contents_binarydiag but I don't find a similar assignment in subroutine contents_netcdfdiag. I suspect setting the Nonlinear_QC_Var_Jb variable was overlooked.

EdwardSafford-NOAA commented 4 years ago

Using code from release/gfsda.v16.0.0, added Nonlinear_QC_Var_Jb to the NetCDF diag file in setupw.f90. Used that branch so Russ could test is using v16rt2 parallel. The var is now included in the diagnostic file, and is 0 in all cases. Per Su, that's as it should be. I'll include that code change in this release.

EdwardSafford-NOAA commented 4 years ago

Added support for reading gps diag files to the common module conmon_read_diag.F90. This is working in the first executable, conmon_grads_sfctime.x. Installing it in conmon_grads_sfc.x yielded a seg fault, which I'm investigating. But now it appears I've been hamstrung by a change in libraries or modules (cmake was updated and convoluted this AM) -- the nc references suddenly produce a raft of compiler failures.

EdwardSafford-NOAA commented 4 years ago

Update: Compile issue continues to exist. I updated the hera mod file to correctly find the python and cmake modules. That fixed the GSI/ush/build_all_cmake.sh script. At the ConMon level though I'm running into a raft of undefined references to MPI resources. That I don't get; I'm not explicitly utilizing mpi resources in my code. Must be in the nc* libraries somewhere, but what changed that I'm now having problems? Thurs. last this compiled. Diagnosing issues with cmake is part logic and part dart throwing.

CoryMartin-NOAA commented 4 years ago

@EdwardSafford-NOAA GSI and GFS now use netCDF compiled with MPI parallel IO. Take a look at https://github.com/NOAA-EMC/GSI/blob/master/util/netcdf_io/calc_analysis.fd/CMakeLists.txt, you probably just need to add MPI Fortran include/libs flags at compiling.

EdwardSafford-NOAA commented 4 years ago

Update: Compile issue continues to exist. I updated the hera mod file to correctly find the python and cmake modules. That fixed the GSI/ush/build_all_cmake.sh script. At the ConMon level though I'm running into a raft of undefined references to MPI resources. I've added the MPI Fortran include/libs flags but my link errors continue. I'm at a loss.

RussTreadon-NOAA commented 4 years ago

Use CMakeLists.txt in Radiance_Monitor to guide edits to CMakeLists.txt in Conventional_Monitor. After incremental updates Conventional_Monitor executables built on Hera using ush/build_all_cmake.sh. rsync working copy to Venus. ush/build_all_cmake.sh works on Venus. Provide details to Ed via a separate email.

EdwardSafford-NOAA commented 4 years ago

ConMon build_all script is now working. Issues all involved the order of the NetCDF, HDF5, and MPI resource loads.

EdwardSafford-NOAA commented 4 years ago

Have all of the conmongrads*.x executables working now. In reviewing the conmon_read_diag.F90 module I discovered that I missed adding a read function for sst (sea surface temp) files. SST isn't used by the ConMon but I should include that and gps (which is mostly there now) in the module. Comparing the contents of my sst diag files I find some mismatches with what I see in the GSI source file setupsst.f90, which writes both the binary and netcdf diag files. The issues are:

Not sure what "Tr" means here. But if "FoundationTempBG" == background open water temp then the NetCDF file has no equivalent variable for the "Tr" data.

EdwardSafford-NOAA commented 4 years ago

Cleaned up the conmon_read_diag.F90 module. Latest version is now in all the conmongrads* executables and compiles at the ConMon level build. The GSI/ush/build_all_cmake.sh still doesn't work with the ConMon executables. Have yet to find the issue, which must be in the GSI/CMakefile.txt, but I will.

EdwardSafford-NOAA commented 4 years ago

Figured out the top level build problem. What was missing was an include directory for each executable where compiled mods apparently are placed. Without this the gsi files that included kinds.F90 couldn't find it. Not sure I understand how the build is working but these two lines are what fixed the issues with ConMon builds (at the GSI level):

EdwardSafford-NOAA commented 4 years ago

All horiz_hist executables are working. Now adapting the time_vert executable to use the conmon_read_diag.f90 module. Running into run-time issues with the t data. The ps data ran slowly but not prohibitively so, but I'll have to do some engineering on the ps and uv data.

EdwardSafford-NOAA commented 4 years ago

Have begun testing on wcoss_d with live v16rt2 data. As anticipated, memory use for submitted jobs is many times larger than it is for binary formatted diag files, though it's still under 900MB. The time is longer, but only about 2x binary files. This is a single node, downstream process so that shouldn't be an issue.

EdwardSafford-NOAA commented 4 years ago

Horizontal plots seem to be working. Working on hist plots, found an issue with missing and 0 sized scater data files. Traced that back to errors in the diag2grads* scripts which resulted in the scater files not getting transferred to the $TANKDIR. Fixed that, so hist plots have a greater chance of working now.

EdwardSafford-NOAA commented 4 years ago

Making headway on the hist plots. The mechanism used is fairly complex, probably unnecessarily so, but I'm learning as I go. The procedure is:

  1. Extract specific information from each scater data for each type (i.e. ps, q, etc) file and write it to an out file, using a type specific executable in ConMon/image_gen/sorc.
  2. Create a GrADS control file.
  3. Modify a GrADS plot script and run it.

I'm getting the out and control files but so far no output in the resulting png files. Also noted that hist files are produced for both ges and anl but the names are the same so the last one overwrites the first. Have to check the web page for GFS/gdas/conmon data -- I'll bet the html needs to be updated to show hist plots for both ges and anl.

EdwardSafford-NOAA commented 4 years ago

Learned (the hard and long way) how the stats (number of obs, rejected by vqc, rejected by gross, std, and mean) are handed to the hist grads plotting scripts. The calling shell script runs the conmonread*.x executable. This produces a stdout file. The last 3 lines of the stdout file are copied and dumped to a 'fileout' file. The GrADS script then extracts these 5 stats from line 2 of fileout. GrADS is very unsophisticated in how it retrieves lines, strings, and substrings. So, if you happened to, I dunno, stick in some diagnostic output in the program that ended up in stdout, well, you just lost a few days. The ConMon is shot full of rubbish like this -- programming by side effect always goes wrong at some point. I'll fix this and either explicitly dump the needed stats to a helpfully named file, and/or add the stats to the input vars to the GrADS script. No one else needs to stumble into these landmines.

So this gets me the header stats in the resulting plots but I've still got no data in the actual histograms. I continue the investigation.

EdwardSafford-NOAA commented 4 years ago

Ok, plot_hist.sh is working at least for type q data (the first type I've tried to get working). I had to detect and correct one more devious and rather subtle side effect:

  1. Program conmon_read_q.x creates a stdout file for each q type (i.e. stdout_q120_00_ges.2020072800) processed.
  2. The plot_hist.sh script then performs these commands: nlevstr=`cat stdout${dtype}_${cycle}.${PDATE} | grep nlev nlev=echo $nlev_str | gawk '{print $2}'`
  3. But, in my ignorance, I added debug to dump out the conmon_read_q.x namelist contents. One of those fields is a character string that is specified larger than the input string, so it contains unformatted data. The grep command sees this and interprets the file type as binary, and nlev_str ends up containing this string, "Binary file (standard input) matches", and nlev becomes "file".
  4. The nlev value is used to edit the ges and anl control files: xdef="xdef $nlev linear 1 1 " sed -e "s/^xdef.*/${xdef}/" tmp.ctl >tmp1.ctl So the xdef line in the control files reads "xdef file linear 1 1".
  5. GrADS barfs.

Turns out one can add a -a, or --text option to a grep command instructing it to evaluate a binary file as if it were text. This works and I get data in my histogram plots. So that's a work around. But I'm not going to continue this madness. All data we need from conmonread*.x needs to be explicitly written to a dedicated file. Creating multiple dependencies on the format and contents of stdout is a really really stupid idea.

EdwardSafford-NOAA commented 4 years ago

An additional opportunity for sound, maintainable programming practices occurs to me. Every conmonread.x executable contains a file with an list of enumerated index values into the rdiag array. These are in the read_.f90 and read_mor.f90 files. This is dumb, because if the index values should change it requires making changes two 2 files in all 5 executables. Instead these should all be in a single module that is then used by read.f90 and read_*_mor.f90.

EdwardSafford-NOAA commented 4 years ago

Committed changes which remove the direct dependency on the contents and order of the stdout file from the 5 conmonread*.x executables. The plot_hist.sh script used to use a tail -3 on the stdout file to get the stats (counts, sdv, mean, etc) for the histogram plots. This change writes those values to a dedicated file in a keyword-value pairs. The plot_hist.sh script then retrieves those values and loads them into the file that the GrADS script is expecting. Now stdout can be modified (say by adding/removing debug) without consequence to the input file for the GrADS plot script.

Note that UV has not yet been updated to use these changes; types PS, Q, and T are good to go with respect to the histograms.

EdwardSafford-NOAA commented 4 years ago

Completed work on uv histogram plots. This included a general clean up on the common files rm_dups.f90 and convinfo.f90. Will now commence work on time and vert plots.

EdwardSafford-NOAA commented 4 years ago

Completed work on uv histogram plots. This included a general clean up on the common files rm_dups.f90 and convinfo.f90. Will now commence work on time and vert plots.

EdwardSafford-NOAA commented 4 years ago

While retesting on hera discovered problems with the q* horiz plots. Apparently the RH2m term is not in the pared analysis and guess files, so that's an extraction issue. All other horiz plots seem to be ok. Have saved off work from hera and will move back to wcoss and see about addressing the missing field.

EdwardSafford-NOAA commented 4 years ago

I think I've got it figured out. The issue is teh available fields in the ges and analysis grib files. The humidity (q) plots use the plot_qallev_horz.gs script, except for the q18*, which use plot_qsfc_horz.sh. It's qsfc_horz that references RH2m, a variable that does not exist in the original grib files. I thought it might be an issue with the selective paring on those grib files, but RH2m doesn't exist in the starting files. So I assume it's a relic of a bygone configuration. The qallev_horz uses the var RHprs and they plot up to 10 levels using those values.

I modified qsfc_horz to plot RHprs instead of RH2m. This works. Next step is to check with Su to see if this makes sense.

EdwardSafford-NOAA commented 4 years ago

This is the result of the existing plot_qsfc_horiz.gs script: q187_00_region1_orig

And the modified script: q187_00_region1

EdwardSafford-NOAA commented 4 years ago

Per Su:

"...there are no surface fields in pgb anl files. However the surface fields are in the files f000 and f006. So I suggest to copy gdas.t${cycle}z.pgrb2.0p25.f000 instead anl. f000 file is very close to anl files. If possible, could you copy 0p25* file with higher resolution.

"for RH: RH2m, Temperature: TMPsfc and wind: UGRD10m and VGRD10m"

So I'll try swapping:

and see how that works out both in terms of resulting graphics and in impact on storage space.

Update: storage space may become an issue. This change produces pared grib files that are 89% larger. I may have to purge the stored data sooner than 3 months. That won't affect plotting though; time series only span 30 days and horz/hist are just the latest cycles. In fact, the horz/hist data could be safely purged at 1 week. Note to self -- add updating the saved data purge mechanism as part of this project.

EdwardSafford-NOAA commented 4 years ago

All q plots now seem to be working. Su has given her thumbs up as well to the q187 plots. I had a bit of a struggle getting the text on the color bars to show up -- it wasn't rocket science, just a factor of not working often with GrADS and missing that the text was plotting in the background color.

I've given additional thought to the storage issue and believe that won't be difficult to handle. I will take that piece up after I get all the plots working with this higher res data.

EdwardSafford-NOAA commented 4 years ago

Horizonal plots for ps, q, t, and uv are now working, including color bars (which haven't worked in forever). Histogram plots are working as well. Have now begun work on time and vertical plots.

EdwardSafford-NOAA commented 4 years ago

Time and vert plots appear to be working -- results are plausible.

The uv, u, and v horiz plots take quite a long time to complete. I'll explore ideas for improving that.

EdwardSafford-NOAA commented 4 years ago

Confirmed time and vertical plots are working. There are run time issues with the u, v, and uv time plots. I had initially planned on addressing runtime issues on this ticket, but after some exploration it looks like a larger task than I thought. The condition was preexisting, and this ticket has been open a long time. I'm going to address the image generation issues as a separate issue/ticket.

So that means this issue/ticket is almost complete. Only some general clean up and testing on wcoss_c remain.

EdwardSafford-NOAA commented 4 years ago

Ooops. Forgot I also need to install a mechanism to remove the horz_hist subdirectories in C_TANKDIR after 7 days and the time_vert after 40. So 3 items remain.

This item is now done. I worked up a script to accomplish this as the last step of the data extraction. I'll put it on a switch that will default to off for (future) operational use and can be set in the parm/ConMon_config.sh file.

EdwardSafford-NOAA commented 4 years ago

Had a chance to rethink a few issues with plotting this morning (dev system was unavailable this AM), and came up with this idea and implemented it on hera:

PDATE can be set one of 3 ways. This is the order of priority:

  1. Specified via command line argument
  2. Read from ${C_IMGNDIR}/last_plot_time file and advanced one cycle.
  3. Using the last available cycle for which there is data in ${C_TANKDIR}.

If option 2 has been used the ${C_IMGNDIR}/last_plot_time file will be updated with ${PDATE} if the plot is able to run. If not then no last_plot_time file will be created.

This gets rid of the need for my "driver" scripts and the data_map.xml file I've been using to store the last plot date for each ${NET}/${RUN} value. My driver scripts were a rather inelegant hack to solve a problem that's unique to me -- I have to run a whole bunch of Mon scripts for a whole bunch of ${NET}/${RUN} values, so I need to figure out the last plotted cycle and then plot the next one. Once upon a time I had an idea that I could store what I needed for every ${NET}/${RUN} pair in the data_map.xml files and set up the driver scripts to query values and override the defaults as needed. Over time I've pretty much reduced the use of the data_map files to just store the last plotted date. Now I think I can do away with them entirely, while retaining the same functionality I need, reducing the confusion for developers. If this works as planned here in the ConMon I'll propagate the changes to all the other Mon packages.

EdwardSafford-NOAA commented 4 years ago

After discussion with Su I revisited the missing uv245, uv246 data. The expected subtypes for those sources is 00, 257, and 259, but those aren't in the uv diag file. Instead there are subtypes of 270 and 271. So I swapped in the latest global_convinfo.txt file, which contains those new subtypes and am re-running DE for the past 4 cycles.

EdwardSafford-NOAA commented 4 years ago

Added a mechanism to establish average obs counts and compare those to cycles as the data is extracted. This will then be used to produce low count reports. Am testing this with v16rt2 now. Looks like, as with the other *Mon packages, I'll need to report only those low counts that persist for 2 cycles in a row to avoid creating too many reports (data is pretty choppy so far). Some study will be necessary to determine how best to handle this.

EdwardSafford-NOAA commented 4 years ago

Updated the base files and that has helped the low count reporting. Another 2 weeks worth of data should be sufficient. Updated the ConMon web site to reflect the currently available data.

Turned off generation of u and v time series plots. There was no difference between uv, u, and v time series plots (for a given type/subtype). Those are the most expensive plots to generate (in terms of wall time), so I reduced that to just uv plots. The web site has been updated accordingly.

EdwardSafford-NOAA commented 4 years ago

Su reported some of the vert plots for GFS/gdas had not updated since 10/6. I traced that back to my update of the convinfo table, which had more entries than the previous version. The number of UV entries exceeded the specified array sizes for the UV entries but the executable was (somehow) able to continue to write beyond the ends of the arrays. I would have expected a core dump but for some reason it behaved like a C program and ended up corrupting the T data. I bumped up the array sizes to fix it. A better fix would be to re-dimension the arrays on the fly as needed, but I don't have time to implement that at the moment -- I'll add that to ConMon to-do list.

It appears that only the GFS/gdas data was incorrectly read. The v16rt2/gdas data was correctly handled. That's a bit mystifying so I'll look into why that worked next.

EdwardSafford-NOAA commented 4 years ago

I've overhauled most of the ConMon web site and added a selector pulldown for data source. Right now those sources are GFS/gdas and v16rt2/gdas, and that's now easily modified. I kept the existing 00,06,12,18 cycle directories but I've made changes to the image generation to support more than 4 cycles, and have written a php script on the server side to generate a list of available cycles so they can be loaded into a cycle pulldown menu.

Perhaps related to my recent changes I found that some of the time series plots for UV are either missing or old (meaning they haven't been generated in some time). I'm looking into that now.

EdwardSafford-NOAA commented 4 years ago

The missing UV plots are a scripting issue. During the image generation I'm trying to rename the files to add the full cycle time. That operation fails because the uv file list is too long, so I'll have to step through that by type. I'm also going to add some cleanup logic to the image generation so old files, of a use-defined period, are removed, then update the rsync to actually mirror the image directories to the server so they get cleaned up automatically. This hasn't been an issue for the ConMon heretofore because the image file storage was done in 4 bins (00, 06, 12, 18) and continually overwritten as new images were generated.

EdwardSafford-NOAA commented 3 years ago

I got the warning messages working and sent Su a copy for her review. She pointed out that there is expected cyclical variation in the obs counts for many of the obs types. She suggested that establishing an average count should be done by cycle (00, 06, 12, 18) rather than by combining all cycles. That was helpful. So I'm going to implement that and then adjust the bound relative to the average from there.

Should point out the warnings contain a hyperlink for each entry. I started work on processing the embedded vars in the hyperlink to enable correct navigation to the correct plot (e.g. time series plot, net, run, type/subtype, count). I'm going to finish that work up then create the historic average (base) files by cycle.

Additionally the time and vert plots produce 3 types of plots which are labeled "count", "bias" and "bias2", but the original web site did not include any way to access the "bias2" plots. I asked Su about that and she said we don't need the bias2 plots. So I've eliminated those, which saves considerable processing time and server space.

EdwardSafford-NOAA commented 3 years ago

Cycle specific checks have been implemented. This required changes to the base (historic count) files, averaging them by source (type/subtype) and hour, instead of just source.

I hit a problem while trying to get the ConMon's web site to correctly navigate the hyperlinks (provided by the warning reports). It took me a while to figure out, but what had happened is that I no longer owned any of my files or directories on emcrzdm at /home/people/emc/www/gmb/gdas. As part of server cleanup the instructions were misunderstood and while the gdas directory remained John Derber's, Daryl was made owner everything underneath that. Su and I had read permissions, but couldn't make any changes to any of the *Mon files. So nothing has been updated since 11/25. I thought a fix was underway on 11/27 but as of 11/30 AM, I still don't own my files.

EdwardSafford-NOAA commented 3 years ago

Web server files have been restored to correct state, so Su and I again own our files and directories.

I got the hyperlinks working on 12/4, both for the time series and surface time series plots. One issue remains -- the base (historic bounds) files for some of the uv28* sources are way out of whack. It's possible there is an issue with separating type/subtype combinations or something similar that produces averages that are an order of magnitude too large for about 5 sources. I'll have a look at that when we get the dev system back.

EdwardSafford-NOAA commented 3 years ago

I've begun sharing the warning reports with Su and Russ. Per input from Su I've added the average value to the warning reports (immediately following the bound value).

Per Iliana the European center uses a rolling 30 day average as their historic base mechanism. That could be easily implemented but I wonder if that would let slow degradation situations slide. It's an idea worth some discussion.

Next week we have a general meeting on current monitoring (12/16). Depending on the outcome of that meeting I may finally be able to hand this ticket off to Mike. I'm sure ready to declare victory.