Open rburghol opened 3 months ago
Hey @COBrogan, @ilonah22, @nathanielf22 and @mwdunlap2004 - the batch run that Michael did last night cruised along and did the simple LM for many many segments, but has been stalled for almost 12 hours doing the Dan River at Wentworth NC (usgs_ws_02071000). I am 99% sure that this is the same segment that stalled when Connor ran it a couple weeks ago. Now, it is a fairly large area, so it could be that it is just too big, however, since this same one has come up twice, and not some other large segment like the James River at Richmond or something, I am wondering if there might be some hinkiness in the geometry that is causing troubles?
Other than that thought, I have no real ideas on how to debug it -- we can check st_isvalid()
on the geometry I suppose? I will note that I tried accessing that shape in the vahydro watershed browser (which runs a spatial containment query) and it has been running for a several minutes without returning results, however, I can view the Dan River at South Boston (taking less than 15 seconds) which I think contains the Wentworth segment further bolstering the bad geometry hypothesis.
Anyhow, I did not yet kill Michael's job on that segment, but I think we can check in on it later today and give it the adios if it hasn't completed -- queries that run perpetually can have a very bad impact on overall system performance.
Another item of note is that we have 156 nldas2 gages, 182 PRISM gages, and 184 daymet gages. I'm not sure why the numbers are different, but I will say for at least one of the datasets I somehow managed to download storm_vol results for at least one gage.
@mwdunlap2004 At least some of the missing NLDAS gages are a result of the issues we've been having with st_clip
. I looked into a few, but others may be failing for other reasons. The PRISM and daymet numbers are akin to those run via the storm volume method (and it's possible the 2 additional daymet gages failed for PRISM due to st_clip
, but seems a bit more unlikely). We will want to take a close look at some of those that failed i.e. run the workflow step by step and identify the process that is failing.
The storm_vol results you downloaded from PRISM were generated months ago and must have been an early test. I have deleted them from /media/model/met/PRISM/out/ to prevent that issue in the future.
Here's some early images to help visualize the results @nathanielf22 @ilonah22. First, all data (this one is a little hard to read):
Now just simple lm:
Now just storm volume:
There are 901 instances in which the two methods predict different "best" data sources for a given gage and month. The ratings difference between these "best" data sets range from -100 - 50% (with a negative value indicating a much better performance by the storm volume method). Code below but you'll need to tweak the file paths in lines 3 and 6 because my code assumes you have folders for each data with the names in line 3, all of which are in the ratings/ directory in line 6:
Thanks for the review on these results @COBrogan !!
I would add that "best" in my mind does not refer to the analysis method, but rather, to the data source. To that, I think it is quite interesting the handful of months where the simple lm shows large differences between NLDAS2 and prism/daymet (April, May, November, December). I think this is intriguing since the goal is not to get the best R^2, but in a sense, to find where the data sources differentiate the most. Of course, it may very well be that the difference in those datasets shown by lm
is a spurious artifact of the method itself (or some data error!), but only time will tell -- I think we should add some of the USGS gages with big differences there into this deep dive list and begin testing them by running models as well as plotting out what exactly is happening here. (maybe we can start thinking about monthly flow error calcs between model and observed to bolster this?).
To better enable us to answer these questions, we need to get these into the REST database, so that they are accessible with om_vahydro_metric_grid
to facilitate always up to date comparisons. I'll take the lead on getting everyone up to speed, and showing some template scripts in the meta_model workflow that I have been using for the WDM creation path.
@rburghol @COBrogan I created plots for 8 gages that ran for nldas2 last night of the 10 that Ilona had selected for use previously, and this matches my results, nldas2 seems to be consistently worse than daymet and PRISM. Daymet and PRISM also seem to be very similar to each other and I didn't notice many differences in their results, and in many cases their lines were eerily similar. I'm going to run some summary statistics on these gages after I talk to Dr. Scott today about how to handle nldas2 missing 2 gages (in regards to my presentation). In the long term, I think improving the clipping methods could help this?
@mwdunlap2004 -- the clipping methods will certainly fix this. I am 99.9% certain these are just the result of the overlap algorithm that we are using, and the fix is either to resample or to use the polygon intersection method -- both of which work, but performance is an issue. So, it is totally cool to have cases where your data is not workign out, for reasons of resolution - very good point to make to the audience imo (@COBrogan may have other thoughts)
Segments that need in depth QA due to large errors, missing data, or peculiar interest. @mwdunlap2004 @COBrogan @ilonah22
usgs_ws_02071000 Dan River at Wentworth NC
calc_raster_ts
withCalling: /opt/model/model_meteorology/sh/calc_raster_ts usgs_ws_02071000 nldas2_obs_hourly /tmp/usgs_ws_02071000_1725581950_277/usgs_ws_02071000-nldas2-all.csv.sql /tmp/usgs_ws_02071000-nldas2-all.csv dbase2 drupal.dh03
select st_isvaliddetail(dh_geofield_geom) from field_data_dh_geofield where entity_id = 437550;
st_isvaliddetail ---> (t,,)
select st_area2d(dh_geofield_geom) from field_data_dh_geofield where entity_id = 437550;
0.2726207204175703
select st_area2d(dh_geofield_geom) from field_data_dh_geofield where entity_id = 290049;
0.7199793220968411
select st_numgeometries(dh_geofield_geom) from field_data_dh_geofield where entity_id = 437550;
st_numgeometries --> 1
L51079
nldas2
./nldas_land_cells Land_segment
#_of_pairs Cell1_X Cell1_Y Cell2_x Cell2_Y ...
./nldas_land_cells L51079
6 372 106 373 106 374 106 372 107 373 107 374 107