HARPgroup / model_meteorology

0 stars 0 forks source link

Do resample mode #87

Open rburghol opened 1 week ago

rburghol commented 1 week ago

Testing

- Try the tiled workflow for N51660 (NF Shen with NA at NLDAS2 resolution)
   - time ~ 2 hours
   - url: http://deq1.bse.vt.edu:81/met/nldas2_resamptile/precip/N51660-nldas2-all.csv
   - file: /media/model/met/nldas2_resamptile/precip/N51660-nldas2-all.csv

i=N51660; met_scenario="nldas2_resamptile" sbatch /opt/model/meta_model/run_model raster_met $met_scenario $i auto wdm



**Image 1:** Compare daily precip from nldas2 from original CBP method (`met2date` scenario) with `nldas2_resamptile`.

![image](https://github.com/user-attachments/assets/e61b4a34-e2ef-4d9b-8451-04cf47194876)

**Image 2:** compare daily precip from original CPP method with the `ST_clip` method. note: this land segment is N51101, which differs from the above, so this is not a direct comparison of the three methods. TBD: repeat for N51660.
![image](https://github.com/user-attachments/assets/d6dad9b2-3498-42f9-93ff-f76a881f5df8)
rburghol commented 1 week ago

@COBrogan this is def an area that needs debugging. The 2 alternatives I tested yesterday are below, but neither work. The tiled one gives no data/no csv (2 gigs of error messages in log!) and the non-tiled yields a CSV with null values in every timestep.

I could use some help with this. May be a good approach would be to set the date range to be very narrow in the .con file to debug quicker. Who knows maybe there's just some flaw in my query that I'm not seeing. The Aquarius generated in the script, LinkedIn the top of this issue, which was just based on your calc_raster_ts.

rburghol commented 1 week ago

More debugging @COBrogan -- brought the time down to 2 hours with the tiled dataset (I had forgotten to filter out tiles that did NOT overlap). Added in the && condition, tho it did not really improve performance a ton.

cd /opt/model/p6/vadeq
. hspf_config
# set needed environment vars
MODEL_ROOT=/backup/meteorology/
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT SCRIPT_DIR
\set band '1'
 \set ftype 'cbp6_landseg'
 \set varkey 'nldas2_precip_hourly_tiled'
 \set resample_varkey 'daymet_mod_daily'
 \set hydrocode 'N51660'
 \set fname '/tmp/N51660-nldas2-all.csv'
 \timing ON
rburghol commented 1 week ago

@COBrogan I'm going to kick off a large batch of WDM creation with this re-sample technique today unless that is going to consume too many resources and get in your way. If you check out the images in the body of this issue, you can see that resampling appeared to have more differences for certain individual days than the difference between clipping and the CBP overlap with NLDAS2 (note those images are different land segments so it's not an apples to apples comparison, but it will be shortly!). Super curious to see what that does to performance.

I will be checking this issue so if you feel like you need the CPU cycles, let me know and I will cancel the batch, or you can do so at your convenience (in case you've never used it scancel is the command for slurm jobs).

rburghol commented 1 week ago

Test with entire Rapidan River:

Image 1: N51177 coverage overlap with nldas2 (boxes) and prism (noaa) image

Debugging

 cp /tmp/N51177_1725800454_29331/N51177-nldas2-all.csv.sql ./
# change date range to short period manually with nano N51177-nldas2-all.csv.sql
cat N51177-nldas2-all.csv.sql | psql -h dbase2 drupal.dh03
\set band '1'
 \set ftype 'cbp6_landseg'
 \set varkey 'nldas2_precip_hourly_tiled'
 \set resample_varkey 'daymet_mod_daily'
 \set hydrocode 'N51177'
 \set fname '/tmp/N51177-nldas2-all.csv'
 \set start_epoch 441777600
 \set end_epoch 1704085199
 select hydroid as met_varid from dh_variabledefinition where varkey = :'varkey' \gset
 select hydroid as fid from dh_feature where hydrocode = :'hydrocode' and ftype = :'ftype' \gset
 select hydroid as covid from dh_feature where hydrocode = 'cbp6_met_coverage' \gset
 \timing ON
 copy ( select met.featureid, to_timestamp(met.tsendtime) as obs_date, met.tstime, met.tsendtime, extract(year from to_timestamp(met.tsendtime)) as yr, extract(month from to_timestamp(met.tsendtime)) as mo, extract(day from to_timestamp(met.tsendtime)) as da, extract(hour from to_timestamp(met.tsendtime)) as hr, (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_mm, 0.0393701 * (ST_summarystats(st_clip(met.rast, fgeo.dh_geofield_geom), 1, TRUE)).mean as precip_in from dh_timeseries_weather as met, field_data_dh_geofield as fgeo where met.featureid = :covid and met.varid = :met_varid and ( (met.tstime >= :start_epoch) OR (-1 = :start_epoch) ) and ( (met.tsendtime <= :end_epoch) OR (-1 = :end_epoch) ) and fgeo.entity_type = 'dh_feature' and fgeo.entity_id = :fid and (fgeo.dh_geofield_geom && met.bbox ) order by met.tsendtime ) to :'fname' WITH HEADER CSV;
rburghol commented 4 days ago

New tiled 16x16 with shorter name to see if that fixes wdm import.

Create a baseline scenario to compare it to: