Open rburghol opened 2 months ago
@rburghol This behemoth of a query ran in 8 seconds to extract one year of data with clipping and resampling for the land segment. It takes about 30 seconds to run for usgs_ws_01668000
(1595 sq mi drainage area). For forty years with this land segment, the below query ran in 6 mins 17 seconds.
\set hydrocode 'H51113'
\set fname '/tmp/H51113-nldas2-all.csv'
\set band '1'
\set varkey 'nldas2_precip_hourly_tiled_16x16'
\set resample_varkey 'daymet_mod_daily'
-- sets all integer feature and varid with query
select hydroid as covid from dh_feature where hydrocode = 'cbp6_met_coverage' \gset
WITH usgs_features AS (
SELECT *
FROM dh_feature
WHERE hydrocode = :'hydrocode'
),
metUnion as (
Select met.featureid, met.tsendtime,
st_union(met.rast) as rast
FROM usgs_features as f
left outer join field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where extract(year from to_timestamp(met.tstime)) = 2023
and b.varkey=:'varkey'
and met.featureid = :covid
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),fgeo.dh_geofield_geom)
group by met.featureid, met.tsendtime
),
met as (
Select met.featureid, to_timestamp(met.tsendtime) as obs_date,
extract(year from to_timestamp(met.tsendtime)) as yr,
extract(month from to_timestamp(met.tsendtime)) as mo,
extract(day from to_timestamp(met.tsendtime)) as da,
extract(hour from to_timestamp(met.tsendtime)) as hr,
(ST_summarystats(st_clip(st_resample(met.rast,rt.rast), fgeo.dh_geofield_geom), :'band', TRUE)).mean as stats
FROM usgs_features as f
left outer join field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
JOIN metUnion AS met
ON ST_Intersects(ST_ConvexHull(met.rast),fgeo.dh_geofield_geom)
left join (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
)
select featureid, obs_date, yr, mo, da, hr,
0.0393701 * stats precip_in
from met
order by met.obs_date;
All 40 years of NLDAS has been retiled into dh_timeseries_weather
with the varkey nldas2_precip_hourly_tiled_16x16
A new workflow is available to reference these, stored under the configs nldas2_tiled
and stormVol_nldas2_tiled
. The only change is the varkey. I also updated nldas2_resampled
and nldas2_resamptile
to reference the tiled dataset. I tested nldas2_tiled
and had no issues:
sbatch /opt/model/meta_model/run_model raster_met nldas2_tiled usgs_ws_01668000 auto geo
The above query uses ST_Intersects
and takes about 18 seconds to run for 2023 for landsegment H51113. To try and improve efficiency, I replaced the ST_Intersects
call with a simple &&
referencing dh_timeseries_weather.bbox
. The query ran in 13 seconds and was faster. Several SO posts imply &&
should either be equivalent to ST_Intersects
or faster as it does not create the bounding box intersection.
\set hydrocode 'H51113'
\set fname '/tmp/H51113-nldas2-all.csv'
\set band '1'
\set varkey 'nldas2_precip_hourly_tiled_16x16'
\set resample_varkey 'daymet_mod_daily'
-- sets all integer feature and varid with query
select hydroid as covid from dh_feature where hydrocode = 'cbp6_met_coverage' \gset
WITH usgs_features AS (
SELECT *
FROM dh_feature
WHERE hydrocode = :'hydrocode'
),
metUnion as (
Select met.featureid, met.tsendtime,
st_union(met.rast) as rast
FROM usgs_features as f
left outer join field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where extract(year from to_timestamp(met.tstime)) = 2023
and b.varkey=:'varkey'
and met.featureid = :covid
) AS met
ON fgeo.dh_geofield_geom && met.bbox
group by met.featureid, met.tsendtime
),
met as (
Select met.featureid, to_timestamp(met.tsendtime) as obs_date,
extract(year from to_timestamp(met.tsendtime)) as yr,
extract(month from to_timestamp(met.tsendtime)) as mo,
extract(day from to_timestamp(met.tsendtime)) as da,
extract(hour from to_timestamp(met.tsendtime)) as hr,
(ST_summarystats(st_clip(st_resample(met.rast,rt.rast), fgeo.dh_geofield_geom), :'band', TRUE)).mean as stats
FROM usgs_features as f
left outer join field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
JOIN metUnion AS met
ON ST_Intersects(ST_ConvexHull(met.rast),fgeo.dh_geofield_geom)
left join (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
)
select featureid, obs_date, yr, mo, da, hr,
0.0393701 * stats precip_in
from met
order by met.obs_date;
Building off this, the full forty year period ran in 8 minutes and 22 seconds using ST_Intersection
but only 8 minutes and 4 seconds via &&
. Minor time savings, so I'll update calc_raster_ts
and our resample workflow
@COBrogan dang that 18 to 13 second drop for a single year had me pretty stoked... tho saving only 18 seconds on the full 40 years took the wind outta my sails!!
rast
column andbbox
columnST_SummaryStats()
st_union
before summaries:ERROR: invalid memory alloc request size 1216428488
st_clip
again in case we have new or updated data that is causing thisst_clip
at somest_envelope + [1 raster cell width]
in a pre-step (into a temp or perm table?) to eliminate errors, plus make things faster (even though previous efforts to use tiles failed to make anything faster...)Case Study H51113
bbox
column with&&
TookTime: 6359980.953 ms (01:45:59.981)
nldas2_precip_hourly_tiled_4by4
nldas2_precip_hourly_tiled
nldas2_obs_hourly
N51177
- has 6 overlapping nldas2 cellsSet Up
\timing ON
select met.featureid, to_timestamp(met.tsendtime) as obs_date, met.tid, met.tstime, met.tsendtime, extract(year from to_timestamp(met.tsendtime)) as yr, extract(month from to_timestamp(met.tsendtime)) as mo, extract(day from to_timestamp(met.tsendtime)) as da, extract(hour from to_timestamp(met.tsendtime)) as hr, (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).count, (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_mm, 0.0393701 * (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_in from dh_timeseries_weather as met, field_data_dh_geofield as fgeo, raster_templates as rt where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and rt.varkey = :'resample_varkey' and met.tsendtime <= 1704085199 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) order by met.tsendtime ;
select met.featureid, to_timestamp(met.tsendtime) as obs_date, met.tid, met.tstime, met.tsendtime, extract(year from to_timestamp(met.tsendtime)) as yr, extract(month from to_timestamp(met.tsendtime)) as mo, extract(day from to_timestamp(met.tsendtime)) as da, extract(hour from to_timestamp(met.tsendtime)) as hr, (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).count, (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_mm, 0.0393701 (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_in from dh_timeseries_weather as met, field_data_dh_geofield as fgeo, raster_templates as rt where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and rt.varkey = :'resample_varkey' and met.tsendtime <= 441777600 + 86400 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) order by met.tsendtime ;
select met.featureid, to_timestamp(met.tsendtime) as obs_date, met.tid, met.tstime, met.tsendtime, extract(year from to_timestamp(met.tsendtime)) as yr, extract(month from to_timestamp(met.tsendtime)) as mo, extract(day from to_timestamp(met.tsendtime)) as da, extract(hour from to_timestamp(met.tsendtime)) as hr, 0.0393701 (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_in from dh_timeseries_weather as met, field_data_dh_geofield as fgeo, raster_templates as rt where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and rt.varkey = :'resample_varkey' and met.tsendtime <= 441777600 + 86400 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) order by met.tsendtime ;
select featureid, obs_date, yr, mo, da, hr, (stats).mean as precip_mm, 0.0393701 (stats).mean as precip_in, (stats).count from ( select met.featureid, to_timestamp(met.tsendtime) as obs_date, met.tid, met.tstime, met.tsendtime, extract(year from to_timestamp(met.tsendtime)) as yr, extract(month from to_timestamp(met.tsendtime)) as mo, extract(day from to_timestamp(met.tsendtime)) as da, extract(hour from to_timestamp(met.tsendtime)) as hr, (ST_summarystats(st_clip(st_resample(met.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)) as stats from dh_timeseries_weather as met, field_data_dh_geofield as fgeo, raster_templates as rt where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and rt.varkey = :'resample_varkey' and met.tsendtime <= 441777600 + 86400 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) order by met.tsendtime ) as foo;
select featureid, obs_date, yr, mo, da, hr, (stats).mean as precip_mm, 0.0393701 (stats).mean as precip_in, (stats).count from ( select featureid, to_timestamp(bar.tsendtime) as obs_date, extract(year from to_timestamp(bar.tsendtime)) as yr, extract(month from to_timestamp(bar.tsendtime)) as mo, extract(day from to_timestamp(bar.tsendtime)) as da, extract(hour from to_timestamp(bar.tsendtime)) as hr, (ST_summarystats(st_clip(st_resample(bar.rast, rt.rast), fgeo.dh_geofield_geom), 1, TRUE)) as stats from ( select met.featureid, met.tstime, met.tsendtime, st_union(met.rast) as rast from dh_timeseries_weather as met, field_data_dh_geofield as fgeo where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and met.tsendtime <= 441777600 + 86400 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) group by met.featureid, met.tstime, met.tsendtime order by met.tsendtime ) as bar, field_data_dh_geofield as fgeo, raster_templates as rt where rt.varkey = :'resample_varkey' and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid ) as foo;
drop table rast_ts ; create temp table rast_ts as select tstime, tsendtime, rast, rnum from ( select met.featureid, met.tstime, met.tsendtime, st_union(met.rast) as rast, count(met.rast) as rnum from dh_timeseries_weather as met, field_data_dh_geofield as fgeo where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and met.tsendtime <= 441777600 + 86400 * 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) group by met.featureid, met.tstime, met.tsendtime order by met.tsendtime ) as bar;
Time: 203171.999 ms (03:23.172)
and met.varid = :met_varid and met.tstime >= 441777600 and met.tsendtime <= 441777600 + 86400 * 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and rt.varkey = :'resample_varkey' and (fgeo.dh_geofield_geom && met.bbox ) group by met.featureid, met.tstime, met.tsendtime, rt.rast order by met.tsendtime ) as bar;update rast_ts set rast = st_resample(rast, rt.rastem) from ( select rast as rastem from raster_templates where varkey = :'resample_varkey' ) as rt;
select met.tsendtime, st_summarystats( st_mapalgebra( feat.rast, 1, met.rast, 1, '[rast1] * [rast2]'), 1, TRUE ) as precip_mm from ( select ST_AsRaster(geo.dh_geofield_geom, rt.rast, st_bandpixeltype(rt.rast, 1), 1.0) as rast from field_data_dh_geofield as geo, raster_templates as rt where geo.entity_id = :fid and rt.varkey = :'resample_varkey' ) as feat, res_rast_ts as met;
select featureid, obs_date, yr, mo, da, hr, (stats).mean as precip_mm, 0.0393701 (stats).mean as precip_in, (stats).count from ( select featureid, to_timestamp(bar.tsendtime) as obs_date, extract(year from to_timestamp(bar.tsendtime)) as yr, extract(month from to_timestamp(bar.tsendtime)) as mo, extract(day from to_timestamp(bar.tsendtime)) as da, extract(hour from to_timestamp(bar.tsendtime)) as hr, (ST_summarystats(st_mapalgebra( feat.rast, 1, st_resample(bar.rast, rt.rast), 1, '[rast1] [rast2]'), 1, TRUE)) as stats from ( select met.featureid, met.tstime, met.tsendtime, st_union(met.rast) as rast from dh_timeseries_weather as met, field_data_dh_geofield as fgeo where met.featureid = :covid and met.varid = :met_varid and met.tstime >= 441777600 and met.tsendtime <= 441777600 + 86400 * 365 and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) group by met.featureid, met.tstime, met.tsendtime order by met.tsendtime ) as bar, field_data_dh_geofield as fgeo, raster_templates as rt, ( select ST_AsRaster(geo.dh_geofield_geom, rt.rast, st_bandpixeltype(rt.rast, 1), 1.0) as rast from field_data_dh_geofield as geo, raster_templates as rt where geo.entity_id = :fid and rt.varkey = :'resample_varkey' ) as feat where rt.varkey = :'resample_varkey' and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid ) as foo;