HARPgroup / model_meteorology

0 stars 0 forks source link

Postgresql settings and raster performance #80

Open rburghol opened 3 months ago

rburghol commented 3 months ago

Comparison Summary

Set Up Base Table and Timer function

\set met_covid 617850
\set met_varid 1451
\timing on
\set start_year 2001
\set end_year 2005

drop table tmp_raster_export;
create temp table tmp_raster_export(tid serial, featureid integer, tstime bigint, tsendtime bigint, varid integer, precip float8);

Change Working memory

SHOW work_mem;
 work_mem
----------
 4MB

ALTER SYSTEM SET work_mem = '256MB';
-- this forces non-startup configs to take effect for new connections
SELECT pg_reload_conf();

show shared_buffers;
 shared_buffers
----------------
 128MB

ALTER SYSTEM SET shared_buffers = '8GB';
SELECT pg_reload_conf();

Test with 5-year Sample

-- Gives precip in inches
insert into tmp_raster_export(featureid, varid, tstime, tsendtime, precip)
select f.hydroid, w.varid, w.tstime, w.tsendtime, 
  0.0393701 * (ST_SummaryStats(ST_Resample(st_clip(w.rast, fgeo.dh_geofield_geom), tpr.rast), 1, TRUE)).mean as precip
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join dh_timeseries_weather as w
on (
  w.featureid = :met_covid
  and w.varid = :met_varid 
  and extract(year from to_timestamp(w.tstime)) >= :start_year
  and extract(year from to_timestamp(w.tsendtime)) <= :end_year
)
left outer join raster_templates as tpr
on ( tpr.varkey = 'daymet_mod_daily')
where f.hydrocode = 'N51101'
  and f.ftype = 'cbp6_landseg'
  and f.bundle = 'landunit'
  and w.tid is not null
  AND w.featureid = :met_covid 
;

5-years, 1 landseg, with clip after resample

-- Gives precip in inches
insert into tmp_raster_export(featureid, varid, tstime, tsendtime, precip)
select f.hydroid, w.varid, w.tstime, w.tsendtime, 
  0.0393701 * (ST_SummaryStats(st_clip(ST_Resample(st_clip(w.rast, st_envelope(fgeo.dh_geofield_geom)), tpr.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join dh_timeseries_weather as w
on (
  w.featureid = :met_covid
  and w.varid = :met_varid 
  and extract(year from to_timestamp(w.tstime)) >= :start_year
  and extract(year from to_timestamp(w.tsendtime)) <= :end_year
)
left outer join raster_templates as tpr
on ( tpr.varkey = 'daymet_mod_daily')
where f.hydrocode = 'N51101'
  and f.ftype = 'cbp6_landseg'
  and f.bundle = 'landunit'
  and w.tid is not null
  AND w.featureid = :met_covid 
;

Just Clip for testing

Just resample for testing


create temp table tmp_raster_resample(featureid, varid, tstime, tsendtime, rast)
select f.hydroid, w.varid, w.tstime, w.tsendtime, 
  ST_Resample(st_clip(w.rast, st_envelope(fgeo.dh_geofield_geom)), tpr.rast), fgeo.dh_geofield_geom), 1, TRUE)).mean as precip
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join dh_timeseries_weather as w
on (
  w.featureid = :met_covid
  and w.varid = :met_varid 
  and extract(year from to_timestamp(w.tstime)) >= :start_year
  and extract(year from to_timestamp(w.tsendtime)) <= :end_year
)
left outer join raster_templates as tpr
on ( tpr.varkey = 'daymet_mod_daily')
where f.hydrocode = 'N51101'
  and f.ftype = 'cbp6_landseg'
  and f.bundle = 'landunit'
  and w.tid is not null
  AND w.featureid = :met_covid 
;

Two landsegs at the same time


-- Gives precip in inches
insert into tmp_raster_export(featureid, varid, tstime, tsendtime, precip)
select f.hydroid, w.varid, w.tstime, w.tsendtime, 
  0.0393701 * (ST_SummaryStats(st_clip(w.rast, fgeo.dh_geofield_geom), 1, TRUE)).mean as precip
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join dh_timeseries_weather as w
on (
  w.featureid = :met_covid
  and w.varid = :met_varid 
)
where f.hydrocode in ('N51033', 'N51177')
  and f.ftype = 'cbp6_landseg'
  and f.bundle = 'landunit'
  and w.tid is not null
  AND w.featureid = :met_covid 
;
rburghol commented 3 months ago

Try the following:

From https://www.enterprisedb.com/postgres-tutorials/how-tune-postgresql-memory

rburghol commented 3 months ago

FYI @COBrogan @mwdunlap2004 - The system database settings had ZERO effect (see results above), but after clipping, time to process these basic land segment summaries was cut in half.

mwdunlap2004 commented 3 months ago

It is amazing to think so many of our slow run times can be connected to us just trying to use the entire globe in all of our functions, but it's really good to see that we seem to have a solution to that issue. I wonder how many of our assumptions about run time can be reevaluated based off this, like our statement about st_resample being slow.

rburghol commented 3 months ago

@mwdunlap2004 Hundred percent agree. I expect some important improvements across the board.

rburghol commented 3 months ago

Also @mwdunlap2004 Without the breakthrough on extents with yesterdays TIFF exports, I'd not have even messed with this, and found the efficiency increase.