HARPgroup / model_meteorology

0 stars 0 forks source link

Raster Summary: Test st_intersection vs st_clip vs map_algebra #84

Open rburghol opened 2 months ago

rburghol commented 2 months ago

Need to test methods for coverage raster summary:

Map Algebra optimization

@COBrogan I believe that I have come up with a potential strategy to exploit the speed of st_clip with the desired coverage of st_intersection.

Create a raster "mask"

Create a raster with 1.0 value in each cell that overlaps the coverage

Use st_mapalgebra to multiply the mask raster by the met data raster to extract points of interest

select f.hydroid, w.tstime, w.tsendtime, w.varid, 
  st_mapalgebra(
      w.rast, 1, 
      mask.rast), 1, 
      '[rast1] * [rast2]'
    ) as rast
from dh_feature as f
...

Using code from calc_raster_ts as baseline

Standard st_clip method

\set band '1' 
\set varkey 'nldas2_obs_hourly' 
\set hydrocode  'usgs_ws_01668000' 
\set fname '/tmp/rapp_fred_st_clip.csv' 
\timing on

copy ( select met.featureid, to_timestamp(met.tsendtime) as obs_date, 
    extract(year from to_timestamp(met.tsendtime)) as yr,
    extract(month from to_timestamp(met.tsendtime)) as mo,
    extract(day from to_timestamp(met.tsendtime)) as da,
    (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).mean as precip_mm,
    0.0393701 * (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).mean as precip_in
  from dh_feature as f
  left outer join field_data_dh_geofield as fgeo
  on (
    fgeo.entity_id = f.hydroid
    and fgeo.entity_type = 'dh_feature'
  )
  left outer join dh_variabledefinition as v
  on (
    v.varkey = :'varkey'
  )
  left outer join dh_feature as mcov
  on (
    mcov.hydrocode = 'cbp6_met_coverage'
  )
  left outer join dh_timeseries_weather as met
  on (
    mcov.hydroid = met.featureid and met.varid = v.hydroid
  )
  where f.hydrocode = :'hydrocode'
  order by met.tsendtime
) to :'fname' WITH HEADER CSV;

Explore alternative calc_raster_ts with st_intersection instead of st_clip

\set band '1' 
\set varkey 'nldas2_obs_hourly' 
\set hydrocode  'usgs_ws_01668000' 
\set fname '/tmp/rapp_fred_st_intersection.csv' 
\timing on

copy ( 
  select featureid, obs_date, yr, mo, da, hr, 
    avg(precip_mm) as precip_mm, avg(precip_in) as precip_in
  from (
    select met.featureid, to_timestamp(met.tsendtime) as obs_date, 
    extract(year from to_timestamp(met.tsendtime)) as yr,
    extract(month from to_timestamp(met.tsendtime)) as mo,
    extract(day from to_timestamp(met.tsendtime)) as da,
    extract(hour from to_timestamp(met.tsendtime)) as hr,
    (st_intersection(met.rast, fgeo.dh_geofield_geom)).val as precip_mm,
    0.0393701 * (st_intersection(met.rast, fgeo.dh_geofield_geom)).val as precip_in
  from dh_feature as f
  left outer join field_data_dh_geofield as fgeo
  on (
    fgeo.entity_id = f.hydroid
    and fgeo.entity_type = 'dh_feature'
  )
  left outer join dh_variabledefinition as v
  on (
    v.varkey = :'varkey'
  )
  left outer join dh_feature as mcov
  on (
    mcov.hydrocode = 'cbp6_met_coverage'
  )
  left outer join dh_timeseries_weather as met
  on (
    mcov.hydroid = met.featureid and met.varid = v.hydroid
  )
  where f.hydrocode = :'hydrocode'
  order by met.tsendtime
  ) as foo
  group by featureid, obs_date, yr, mo, da, hr
) to :'fname' WITH HEADER CSV;
\set varkey 'nldas2_obs_hourly'
\set hydrocode  'usgs_ws_03208800'
\set band '1'

select met.varkey, (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).count as cells
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join raster_templates as met
on (
  met.varkey = :'varkey'
)
where f.hydrocode = :'hydrocode';

      varkey       | cells
-------------------+-------
 nldas2_obs_hourly |     4

Now some real values

\set yr 2000
\set mo 3
\set da 2
\set hr 10

select met.tid, met.featureid, to_timestamp(met.tsendtime) as obs_date,
    extract(month from to_timestamp(met.tsendtime)) as mo,
    (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).mean as precip_mm,
    0.0393701 * (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).mean as precip_in,
    (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).count as cells
  from dh_feature as f
  left outer join field_data_dh_geofield as fgeo
  on (
    fgeo.entity_id = f.hydroid
    and fgeo.entity_type = 'dh_feature'
  )
  left outer join dh_variabledefinition as v
  on (
    v.varkey = :'varkey'
  )
  left outer join dh_feature as mcov
  on (
    mcov.hydrocode = 'cbp6_met_coverage'
  )
  left outer join dh_timeseries_weather as met
  on (
    mcov.hydroid = met.featureid and met.varid = v.hydroid
  )
  where f.hydrocode = :'hydrocode' 
  and extract(year from to_timestamp(met.tstime)) = :yr
  and extract(month from to_timestamp(met.tstime)) = :mo
  and extract(day from to_timestamp(met.tstime)) = :da
  and extract(hour from to_timestamp(met.tstime)) = :hr 
  and met.varid in (select hydroid from dh_variabledefinition where varkey = :'varkey')
  order by met.tsendtime;

- get all pixels in the clipped raster

select st_astext( (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom),1, FALSE)).geom), (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom),1, FALSE)).x, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom),1, FALSE)).y, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom),1, FALSE)).val from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = :'varkey' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = :'hydrocode' and tid = 110377526 ;

- get all pixels that overlap the clipped raster 
- that also have non-null value, 
- but preserve the original raster extent so that pixel values are not reset

select st_astext( (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom),1)).geom), (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom, FALSE),1)).x, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom, FALSE),1)).y, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom, FALSE),1)).val from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = :'varkey' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = :'hydrocode' and tid = 110377526 ;

                                        st_astext                                            | x | y  |  val

-------------------------------------------------------------------------------------------------+---+----+-------- POLYGON((-82.7505 37.1255,-82.6255 37.1255,-82.6255 37.0005,-82.7505 37.0005,-82.7505 37.1255)) | 8 | 56 | 0.0032 (1 row)


- Try the above using ST_Intersection

select st_astext( st_centroid((ST_Intersection(met.rast, fgeo.dh_geofield_geom)).geom)), (ST_Intersection(met.rast, fgeo.dh_geofield_geom)).val from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join dh_variabledefinition as v on ( v.varkey = :'varkey' ) left outer join dh_feature as mcov on ( mcov.hydrocode = 'cbp6_met_coverage' ) left outer join dh_timeseries_weather as met on ( mcov.hydroid = met.featureid and met.varid = v.hydroid ) where f.hydrocode = :'hydrocode' and tid = 110377526 ;

              st_astext                  |          val

---------------------------------------------+----------------------- POINT(-82.65823887889381 37.13428596880967) | 0.00559999980032444 POINT(-82.67664570483737 37.08666204286433) | 0.0031999999191612005 POINT(-82.6173913631824 37.096629058928045) | 0 (3 rows)


- Get the pixels that overlap a given polygon

select f.hydrocode, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom, 9999, FALSE),1)).x, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom, 9999, FALSE),1)).y, (ST_PixelAsPolygons(st_clip(met.rast, fgeo.dh_geofield_geom, 9999, FALSE),1)).val from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join raster_templates as met on ( met.varkey = :'varkey' ) where f.hydrocode = :'hydrocode';

create temp table rastest as SELECT 'nldas2_obs_hourly' as varkey, tid, to_timestamp(tsendtime), tsendtime, ST_AddBand(ST_MakeEmptyRaster(rast), 1, ST_BandPixelType(rast), 0.0) as rast FROM dh_timeseries_weather WHERE featureid in ( select hydroid from dh_feature where hydrocode = 'cbp6_met_coverage' ) and varid in (select hydroid from dh_variabledefinition where varkey like 'nldas2_obs_hourly') and rast is not null limit 1;


- Now test rastest

select met.varkey, st_width(st_clip(met.rast, fgeo.dh_geofield_geom)), (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), :band, TRUE)).count as cells from dh_feature as f left outer join field_data_dh_geofield as fgeo on ( fgeo.entity_id = f.hydroid and fgeo.entity_type = 'dh_feature' ) left outer join rastest as met on ( met.varkey = :'varkey' ) where f.hydrocode = :'hydrocode';



_Originally posted by @rburghol in https://github.com/HARPgroup/model_meteorology/issues/50#issuecomment-2302272655_
COBrogan commented 2 months ago

@rburghol I think this workflow makes sense to me but doesn't st_intersection only return partial cells along the boundary? It vectorizes the raster and then performs a true intersection which makes me think it will, which in turn may then prevent our mask from working along the boundary still. Do we need to then include a buffer on that mask? The rest of the workflow seems solid to me.

Doing a little research into st_intersection, I stumbled upon this article. This was very informative and makes me think maybe we should have leveraged tiles and a gist index when importing our rasters. By the logic in this article, our PRISM imports could have been tiled to 16x16 to preserve 2048 bytes per tile and maintain the db page size. This may have sped up the performance of our tiling experiments. The exercise in the article is fairly similar to what we are trying to do on a spatial scale, but on a smaller temporal scale. I think this may be worth looking into and I will try to tackle it if I can free some time up in the next few days (unless the mask solution works, in which case it may not be worth it)!

rburghol commented 2 months ago

@COBrogan I see what you're saying and it makes sense. Perhaps we can use the cell height and width queries to determine the number of cells and then manually create a bounding box?

I've actually already explored tiles in issue #70 -- there was some important performance improvement, though my recollection is vague right now, but I believe that we have like a year of NLDAS2 imported that we can use for testing purposes. If you want to tinker around with it, I think #70 should have some reasonable starting code.