HARPgroup / model_meteorology

0 stars 0 forks source link

Workflow 1: Simple `lm()` analysis: #57

Open rburghol opened 3 weeks ago

rburghol commented 3 weeks ago

Overview

Work flow steps

Prep Variables

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

Quick Run

Run steps

Generate Weekly Ratings

Import Week Ratings

Create Temp Table

copy week_ratings from '/tmp/met_weekly_ratings_01665500.csv' with HEADER CSV;

- Add the varid column

alter table week_ratings add column best_varid integer; alter table week_ratings add column best_varkey varchar(64);

update week_ratings set best_varkey = CASE WHEN best_method = 'prism' THEN 'prism_mod_daily' WHEN best_method = 'daymet' THEN 'daymet_mod_daily' ELSE 'do_not_overwrite_nldas2_obs_hourly' END;


#### Create a base raster for the model domain
- NLDAS2 resampled to daymet resolution (make smallest possible)
- copy and type and add a band with 0.0 value at each cell

--use an existing raster as template for new raster -- adds a single band, with value 0.0 in each pixel using the band type as defined by the daymet raster dataset create table raster_templates(rid serial, rast raster); INSERT INTO raster_templates(rast) SELECT ST_AddBand(ST_MakeEmptyRaster(rast), 1, ST_BandPixelType(rast), 0.0) FROM dh_timeseries_weather WHERE featureid in ( select hydroid from dh_feature where hydrocode = 'cbp6_met_coverage' ) and varid in (select hydroid from dh_variabledefinition where varkey like 'daymet_mod_daily') and rast is not null limit 1;

- Verify Extent of base raster

select st_astext(st_envelope(rast)) from raster_templates; -- POLYGON((-83.66607521306385 44.095589132351506,-73.52855441054511 44.095589132351506,-73.52855441054511 33.95806832983277,-83.66607521306385 33.95806832983277,-83.66607521306385 44.095589132351506))


#### Create a raster clipped to the coverage.
- right now they are just daily
- later we may have a daily/hourly raster for each data source pre-generated and interpolated
- todo: see if we can use `st_union()` to combine each catchment best-fit raster to overlay the base raster, instead of `st_mapalgebra()`, since the default behavior of `st_union` is to simply replace the cells from the first raster with those in the 2nd raster -- this will work, but only if our subsequent rasters are clipped to their watershe boundaries, which *I think* is currently the case (see the image of the 01665500 raster below).
- this code uses `st_resample()` to insure matching raster cells, but this could be done in the import workflow, by simply running an SQL update query on the resulting data, and annotating this in the varkey info
   - Omitting this step will throw an error if the two rasters are not exactly same precision `ERROR:  rt_raster_iterator: The set of rasters provided (custom extent included, if appropriate) do not have the same alignment`

create temp table raster_best(tid serial, featureid integer, tstime bigint, tsendtime bigint, varid integer, rast raster); insert into raster_best(featureid, tstime, tsendtime, varid, rast) select f.hydroid, w.tstime, w.tsendtime, w.varid, st_mapalgebra(tpr.rast, 1, ST_Resample(st_clip(w.rast, fgeo.dh_geofield_geom), tpr.rast), 1, '[rast] + [rast2]') as rast from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join week_ratings as r on ( yr = 1984 ) left outer join dh_variabledefinition as v on ( v.varkey = r.best_varkey ) left outer join dh_timeseries_weather as w on ( extract( year from to_timestamp(w.tsendtime) ) = r.yr and extract( month from to_timestamp(w.tsendtime) ) = r.mo and extract( week from to_timestamp(w.tsendtime) ) = r.wk and v.hydroid = w.varid ) left outer join raster_templates as tpr on (1=1) where f.hydrocode = 'usgs_ws_01665500' and w.tid is not null;


#### Export Sample Records as geotiff or ping
##### Using `gdal_translate` from cmd line
- Must have the rasters in a non-temp table (not a worry, as we will add these to the dh_timeseries_weather table eventually)
   - `create temp table raster_best_out as select * from raster_best;`
- Now export

PGHOST=dbase2 PGPORT=5432 PGUSER=rob gdal_translate -of GTiff "PG:dbname=drupal.dh03 table=raster_best_out where='tid = 1 ' " outfile.tiff


##### Using postgis internal native stuff
- This is an interesting piece of code from [SO](https://gis.stackexchange.com/questions/78086/how-to-export-raster-from-postgis-enabled-db-as-gtiff-with-gdal-translate)

SET postgis.gdal_enabled_drivers = 'ENABLE_ALL'; select tid, lowrite(lo_open(tid, 131072), tiff) As num_bytes FROM ( VALUES (lo_create(0), ST_Astiff( (SELECT rast FROM raster_best where tid = 1)) ) ) As v(tid,tiff);

- Which returns:

tid | num_bytes ---------+----------- 9155256 | 4002 (1 row)

- which can then be used to save to file (note this file gets saved in the /tmp directory on deq2, even though the database is dbase2):

\lo_export 9155256 '/tmp/myraster.tiff' SELECT lo_unlink(9155256 );


- copy with `cp /tmp/myraster.tiff www/files/met/test_01665500.tiff`
- view at: http://deq1.bse.vt.edu:81/files/met/test_01665500.tiff

**Image 1:** Best fit raster for 01665500, January 1 1984.
![image](https://github.com/HARPgroup/model_meteorology/assets/4571170/76c5b04c-cb67-4147-add1-e8348a4fc600)
rburghol commented 2 weeks ago

TRy new method without raster calc which is unneccessary?

create temp table raster_best(tid serial, featureid integer, tstime bigint, tsendtime bigint, varid integer, rast raster);
insert into raster_best(featureid, tstime, tsendtime, varid, rast)
select f.hydroid, w.tstime, w.tsendtime, w.varid, 
  ST_Resample(st_clip(w.rast, fgeo.dh_geofield_geom), tpr.rast) as rast
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join week_ratings as r
on (
 yr = 1984
)
left outer join dh_variabledefinition as v
on (
  v.varkey = r.best_varkey
)
left outer join dh_timeseries_weather as w
on (
   extract( year from to_timestamp(w.tsendtime) ) = r.yr
   and extract( month from to_timestamp(w.tsendtime) ) = r.mo
   and extract( week from to_timestamp(w.tsendtime) ) = r.wk 
  and v.hydroid = w.varid
)
left outer join raster_templates as tpr
on (1=1)
where f.hydrocode = 'usgs_ws_01665500'
and w.tid is not null;
rburghol commented 2 weeks ago

@ilonah22 @mwdunlap2004

COBrogan commented 1 week ago

Trying to track down the issue with NLDAS not generating precip in the command:

/opt/model/meta_model/run_model raster_met PRISM usgs_ws_01660400 auto geo download

The error keeps saying that all pixels in the area have the NoData value. NOTICE: All pixels of band have the NODATA value. But, this does not appear to be the case. If we look at a specific point within the watershed on a day in which we know there is rain in Aquia Creek for usgs_ws_01660400, we can see that NLDAS should have data for this watershed based on the full NLDAS rasters for that day. However, when we clip the raster to just the gage, the clipped raster no longer has data:

SQL ``` select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, ST_Value(met.rast, ST_SetSRID(ST_Point(-77.5, 38.5),4326)) as fullRastPoint, ST_Value(st_clip(met.rast, 1, fgeo.dh_geofield_geom,9999), ST_SetSRID(ST_Point(-77.5, 38.5),4326)) as clipRastPoint from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = 'nldas2_obs_hourly' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) order by met.tstime; ```
featureid obs_date mo fullrastpoint cliprastpoint
617850 2023-01-16 23:00:00-05 1 0
617850 2023-01-17 00:00:00-05 1 0
617850 2023-01-17 01:00:00-05 1 0
617850 2023-01-17 02:00:00-05 1 0
617850 2023-01-17 03:00:00-05 1 0
617850 2023-01-17 04:00:00-05 1 0.0222
617850 2023-01-17 05:00:00-05 1 0.0428
617850 2023-01-17 06:00:00-05 1 0.1352

This is NOT the case for Strausberg. At hydrocode usgs_ws_01634000, we can see that a specific point in the watershed shows equal rainfall in both the full and clipped rasters. This makes me think something has gone wrong with our ST_clip command, but I haven't tracked it down yet

SQL ``` select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, ST_Value(met.rast, ST_SetSRID(ST_Point(-78.7, 38.8),4326)) as fullRastPoint, ST_Value(st_clip(met.rast, 1, fgeo.dh_geofield_geom,9999), ST_SetSRID(ST_Point(-78.7, 38.8),4326)) as clipRastPoint from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = 'nldas2_obs_hourly' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = 'usgs_ws_01634000' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) order by met.tstime; ```
featureid obs_date mo fullrastpoint cliprastpoint
617850 2023-01-16 23:00:00-05 1 0 0
617850 2023-01-17 00:00:00-05 1 0 0
617850 2023-01-17 01:00:00-05 1 0.0002 0.0002
617850 2023-01-17 02:00:00-05 1 0.2112 0.2112
617850 2023-01-17 03:00:00-05 1 0.3976 0.3976
617850 2023-01-17 04:00:00-05 1 0.4244 0.4244
617850 2023-01-17 05:00:00-05 1 0.212 0.212
617850 2023-01-17 06:00:00-05 1 0.2228 0.2228

It's clear that our clip at the very least generates a raster of the shape that we'd generally expect:

POLYGON((-77.6255 38.6255,-77.3755 38.6255,-77.3755 38.3755,-77.6255 38.3755,-77.6255 38.6255))
SQL ``` select ST_AsText(ST_Envelope(met.rast)), ST_AsText(ST_Envelope(st_clip(met.rast, 1, fgeo.dh_geofield_geom,9999,TRUE))) from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = 'nldas2_obs_hourly' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) order by met.tstime LIMIT 1; ```

With some more research, my guess is that ST_clip doesn't grab partial cells from the raster - that is, it only grabs cells that are contained in the file or at least that the centroid is in. See here. My gage is in between several cells. The below image shows only the extent of the clipped raster from above on January 17th and shows that our site doesn't contain any full pixels: image

If I use ST_intersection I start to see values at our site. I imagine I would get similar results with ST_DumpAsPolygons but I'm guessing that will be computationally slow. ST_intersection is already pretty slow based on my testing:

SQL ``` select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, (ST_Intersection(met.rast, fgeo.dh_geofield_geom)).val AS clipVal FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) LEFT JOIN dh_variabledefinition as v ON ( v.varkey = 'nldas2_obs_hourly' ) LEFT JOIN dh_feature as mcov ON ( mcov.hydrocode = 'cbp6_met_coverage' ) LEFT JOIN dh_timeseries_weather as met ON ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) WHERE f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) ORDER BY met.tstime; ```
featureid obs_date mo clipval
617850 2023-01-16 23:00:00-05 1 0
617850 2023-01-17 00:00:00-05 1 0
617850 2023-01-17 01:00:00-05 1 0
617850 2023-01-17 02:00:00-05 1 0
617850 2023-01-17 03:00:00-05 1 0
617850 2023-01-17 04:00:00-05 1 0.1597999930381775
617850 2023-01-17 04:00:00-05 1 0.04280000180006027
617850 2023-01-17 04:00:00-05 1 0.07980000227689743
617850 2023-01-17 04:00:00-05 1 0.022199999541044235
617850 2023-01-17 05:00:00-05 1 0.20419999957084656
617850 2023-01-17 05:00:00-05 1 0.06920000165700912
617850 2023-01-17 05:00:00-05 1 0.09019999951124191
617850 2023-01-17 05:00:00-05 1 0.04280000180006027
617850 2023-01-17 06:00:00-05 1 0.42640000581741333
rburghol commented 1 week ago

@Cobrogan the docs on st_clip, to my read (I could be wrong) suggest that intersection is implicit in st_clip: https://postgis.net/docs/RT_ST_Clip.html

image

Either way, we can control that behavior in how we call st_clip, though still, the st_clip vs st_intersects could be an important distinction, but I am troubled by the image you showed.

Since the Culpepper gage contains the Ruckersville gage, I would expect there to be many more cells overlapping than what you're showing -- I might just be getting confused by the image blocks tho.

That said, at the every least I would expect the Culpepper gage to be less likely than the Ruckersville to have no data problems. I suspect maybe I goofed something up in the query during some recent tidy sessions.

Which makes me wonder: what do the daymet and prism containments look like?

COBrogan commented 1 week ago

@rburghol I'll draft up some figures showing daymet and prism containments. I've tried messing around with ST_clip, but setting crop to TRUE or FALSE yield the same results. My understanding is it just affects the extent of the raster that is returned, with "NoData" value being used to populate cells outside of the intersection. ST_clip is the more efficient version of ST_intersection, but I still haven't found how to get the bordering cells to be included.

Interestingly, resampling the raster to the PRISM resolution also yielded no results. So, these figures showing the containments should help identify if something is going wrong in ST_clip or if we are facing a resolution problem:

ST_Resample SQL Details ``` WITH prism AS ( select met.rast FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'prism_mod_daily') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) LIMIT 1 ) select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, ST_Value(met.rast, ST_SetSRID(ST_Point(-77.6, 38.55),4326)) as fullRastPoint, ST_Value(st_clip(ST_Resample(met.rast,(SELECT rast FROM prism)), 1, fgeo.dh_geofield_geom,9999,TRUE), ST_SetSRID(ST_Point(-77.6, 38.55),4326)) as clipRastPoint FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'nldas2_obs_hourly') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) WHERE f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) ORDER BY met.tstime; ```
rburghol commented 1 week ago

OK - resampling is a good idea, but I am still unclear why this is happening at Culpepper but not Ruckersville. Wonder if something is wrong with the geometry? Like self-intersection or something? The intersection algorithm is very inclusive, it literally just says there is some overlap.

Oh, one question: does "fullrasterpoint" actually just mean a single raster cell value intersecting the point you specified?

Plus @COBrogan I really think it is impossible that Culpepper contains no full pixels -- it's a drainage area of 468sqmi, it has to contain some 15x15km cells... maybe our projection is off, below is USGS map: image

COBrogan commented 1 week ago

Well that's one mystery solved. I've been looking at 01660400, which is Aquia Creek Near Garrisonville. And has a DA of only 35 sq mi. See here for its NWIS page. So it is NOT the Culpepper gage that is giving us trouble, but something much smaller. My bad on the Wild Goose chase there.

As for the potential conflicts in ST_clip for smaller watersheds, see below. Dashed blue is daymet, red is PRISM, and solid black is NLDAS. We can see that there are no NLDAS cells fully contained in the watershed so ST_clip() will not return values for NLDAS. test2

rburghol commented 1 week ago

OK cool, thanks for sorting that out!! @COBrogan

The thing is though, fully contained is not the way it is supposed to be happening. Intersect literally just means they're boundaries cross.

But, re-sampling to the smallest grid we have should do the trick.

Still troubling, however, is that @ilonah22 reported getting no data values for Fredericksburg…

COBrogan commented 1 week ago

I agree with your assertion, @rburghol . Maybe something else IS going on. First, while I've seen other users on SO with the same problem, I've yet to see official documentation of ST_clip() that implies this is the intended behavior. Second, the resampling failed to bring results. And, switching to daymet below, my query is returns logical results which indicates that my testing method works:

SQL Code for Daymet single point extraction before/after clip ``` select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, ST_Value(met.rast, ST_SetSRID(ST_Point(-77.5 38.5),4326)) as fullRastPoint, ST_Value(st_clip(met.rast, 1, fgeo.dh_geofield_geom,9999,FALSE), ST_SetSRID(ST_Point(-77.5 38.5),4326)) as clipRastPoint FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) LEFT JOIN dh_variabledefinition as v ON ( v.varkey = 'daymet_mod_daily' ) LEFT JOIN dh_feature as mcov ON ( mcov.hydrocode = 'cbp6_met_coverage' ) LEFT JOIN dh_timeseries_weather as met ON ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) WHERE f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) ORDER BY met.tstime; ```
featureid obs_date mo fullrastpoint cliprastpoint
617850 2023-01-16 07:00:00-05 1 1.0297999382019043 1.0297999382019043
COBrogan commented 1 week ago

@rburghol ......I was using an incorrect set of coordinates. Resampling the NLDAS raster to PRISM results in me getting the same results at a given point. WIthout resampling, I get no data.

Details ``` WITH refRast AS ( select met.rast FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'prism_mod_daily') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) LIMIT 1 ) select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, ST_Value(met.rast, ST_SetSRID(ST_Point(-77.5, 38.5),4326)) as fullRastPoint, ST_Value(st_clip(ST_Resample(met.rast,(SELECT rast FROM refRast)), 1, fgeo.dh_geofield_geom,9999,TRUE), ST_SetSRID(ST_Point(-77.5, 38.5),4326)) as resampleClipRast, ST_Value(st_clip(met.rast, 1, fgeo.dh_geofield_geom,9999,TRUE), ST_SetSRID(ST_Point(-77.5, 38.5),4326)) as noResampleClipRast FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'nldas2_obs_hourly') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) WHERE f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (17) ORDER BY met.tstime; ```
featureid obs_date mo fullrastpoint resamplecliprast noresamplecliprast
617850 2023-01-16 23:00:00-05 1 0 0
617850 2023-01-17 00:00:00-05 1 0 0
617850 2023-01-17 01:00:00-05 1 0 0
617850 2023-01-17 02:00:00-05 1 0 0
617850 2023-01-17 03:00:00-05 1 0 0
617850 2023-01-17 04:00:00-05 1 0.0222 0.0222
617850 2023-01-17 05:00:00-05 1 0.0428 0.0428
617850 2023-01-17 06:00:00-05 1 0.1352 0.1352

Taking this a step further, we can successfully get mean precipitation from the resampled raster:

Details ``` WITH refRast AS ( select met.rast FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'prism_mod_daily') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) LIMIT 1 ) select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, (ST_summarystats(st_clip(ST_Resample(met.rast,(SELECT rast FROM refRast)), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_kgm3 from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = 'nldas2_obs_hourly' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = 'usgs_ws_01660400' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (16,17,19,20,21) order by met.tstime; ```
featureid obs_date mo precip_kgm3
617850 2023-01-17 00:00:00-05 1 0
617850 2023-01-17 01:00:00-05 1 0
617850 2023-01-17 02:00:00-05 1 0
617850 2023-01-17 03:00:00-05 1 0
617850 2023-01-17 04:00:00-05 1 0.07275999999999999
617850 2023-01-17 05:00:00-05 1 0.09404
617850 2023-01-17 06:00:00-05 1 0.22464
617850 2023-01-17 07:00:00-05 1 0.4828799999999999
617850 2023-01-17 08:00:00-05 1 0.6930399999999999

Looking at a second watershed that has previously given us trouble brings us to usgs_ws_01669000 PISCATAWAY CREEK NEAR TAPPAHANNOCK, VA: test2

Once again, our typical NLDAS summary method using ST_clip returns no data. But, it does as soon as we resample it to the PRISM resolution:

Gage 01669000 Coverage NLDAS Precip Summary ``` WITH refRast AS ( select met.rast FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'prism_mod_daily') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) LIMIT 1 ) select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, (ST_summarystats(st_clip(ST_Resample(met.rast,(SELECT rast FROM refRast)), fgeo.dh_geofield_geom), 1, TRUE)).mean as resampleMeanPrecip, (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 1, TRUE)).mean as noResampleMeanPrecip from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = 'nldas2_obs_hourly' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = 'usgs_ws_01669000' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (16,17,19,20,21) order by met.tstime; ```
featureid obs_date mo resamplemeanprecip noresamplemeanprecip
617850 2023-01-17 05:00:00-05 1 0
617850 2023-01-17 06:00:00-05 1 0.012
617850 2023-01-17 07:00:00-05 1 0
617850 2023-01-17 08:00:00-05 1 0.1956
617850 2023-01-17 09:00:00-05 1 0.03
617850 2023-01-17 10:00:00-05 1 0.884
617850 2023-01-17 11:00:00-05 1 0.559
617850 2023-01-17 12:00:00-05 1 0.0688
617850 2023-01-17 13:00:00-05 1 0.1256
rburghol commented 1 week ago

Awesome @COBrogan -- that looks really encouraging!! Also we have to resample, anyway, so this is doubly great.

I am curious, though, could you run your NLDS to query again and include a SummaryStatsAgg(st_clip(...).count function? I want to see if it is returning 0 points, meaning the intersection doesn't work as I think it should, or if it is the combination of null and non-null values in an intersection that is resolving as null.

COBrogan commented 1 week ago

Sure, rerunning the above query for Piscataway Creek using count, I get the same warning that NOTICE: All pixels of band have the NODATA value. Looking at the results below, we see the count for NLDAS prior to resampling is 0. Looking at the above figure, the 4 count for PRISM also makes it seem like ST_Clip is only getting cells that have a center point that intersects with the boundary.

SQL for Count ``` WITH refRast AS ( select met.rast FROM dh_feature as f LEFT JOIN field_data_dh_geofield as fgeo ON (fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature') LEFT JOIN dh_variabledefinition as v ON (v.varkey = 'prism_mod_daily') LEFT JOIN dh_feature as mcov ON (mcov.hydrocode = 'cbp6_met_coverage') LEFT JOIN dh_timeseries_weather as met ON (mcov.hydroid = met.featureid and met.varid = v.hydroid) LIMIT 1 ) select met.featureid, to_timestamp(met.tstime) as obs_date, extract(month from to_timestamp(met.tstime)) as mo, (ST_summarystats(st_clip(ST_Resample(met.rast,(SELECT rast FROM refRast)), fgeo.dh_geofield_geom), 1, TRUE)).count as resampleMeanPrecip, (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 1, TRUE)).count as noResampleMeanPrecip from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = 'nldas2_obs_hourly' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = 'usgs_ws_01669000' AND extract(year from to_timestamp(met.tsendtime)) = 2023 AND extract(month from to_timestamp(met.tsendtime)) = 1 AND extract(day from to_timestamp(met.tsendtime)) IN (16,17,19,20,21) order by met.tstime; ```
featureid obs_date mo resamplemeanprecip noresamplemeanprecip
617850 2023-01-15 23:00:00-05 1 4 0
617850 2023-01-16 00:00:00-05 1 4 0
617850 2023-01-16 01:00:00-05 1 4 0
617850 2023-01-16 02:00:00-05 1 4 0

One potential fix could be convert the rasters into polygons that represent the cell size and resolution. We would only need one copy of this polygon file for each data source. We then run all intersections/clips on the polygons and then use the extent of the polygon intersection as a mask with which to clip our rasters. I'm going to see if this gives us more logical results.

ilonah22 commented 2 days ago

I ran into an error message this morning while trying to run access-file.R and was hoping somebody could tell me what's going wrong.

I get this error message when I try to run the line that pulls the data in from VA Hydro:

  cannot open the connection to 'http://deq1.bse.vt.edu:81/files/met/usgs_ws_03176500-prism-all.csv'
In addition: Warning message:
In file(file, "rt") :
  cannot open URL 'http://deq1.bse.vt.edu:81/files/met/usgs_ws_03176500-prism-all.csv': HTTP status was '404 Not Found'

I also tried daymet but got the same issue.

rburghol commented 2 days ago

Hey @ilonah22 sorry to delay response. Last week we shifted the output files to a different directory, so we need to change the URL references to the following:

See here for more details: https://github.com/HARPgroup/HARParchive/issues/1291