HARPgroup / meta_model

0 stars 0 forks source link

Run with slurm #3

Open rburghol opened 2 years ago

rburghol commented 2 years ago

This is very rough draft prototype examples. Todo:

Cancelling jobs:

sjobs=`squeue --format="%.4i"`
for i in $sjobs; do scancel $i; done

View slurm Jobs with Full Names

. hspf_config
basin=JA5_7480_0001
segs=`cbp get_riversegs $basin`
scenario="subsheds"
model="hspf_cbp6" # alt hsp2_cbp6, etc.

River slurm Dependency

segs=`cbp get_riversegs $basin`
for i in $segs; do
  deps=`cbp mm_river_deps $scenario $i`
  echo "Segment $i Found deps= $deps"
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i  auto river"
  sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i auto river
done

River Upstream & Land Simulation Dependencies

Trigger the job with:

# run land
segs=`cbp get_landsegs $basin`
location="p6_vadeq" # crucial to avoid collisions if running multiple installs
read -r scope land_scenario < <(cbp get_config $scenario river "LAND SCENARIO")
for i in $segs; do
  job_name="${location}_${scenario}_${i}"
  echo "sbatch --job-name=\"$job_name\" /opt/model/meta_model/run_model $model $land_scenario $i auto land "
  sbatch --job-name="$job_name" /opt/model/meta_model/run_model $model $land_scenario $i auto land 
done
# run river
# can now immediately be submitted after submitting land 
# since the dependency ordering works with land and river 
segs=`cbp get_riversegs $basin`
for i in $segs; do
  deps=`cbp mm_river_deps $scenario $i`
  echo "Segment $i Found deps= $deps"
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i  auto river"
  sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i auto river
done

Show the Dependencies

             JOBID PARTITION                                 NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
              3390     debug      p6_vadeq_subsheds_YM1_6370_6620      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3391     debug      p6_vadeq_subsheds_YM2_6122_6120      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3392     debug      p6_vadeq_subsheds_YM2_6120_6430      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3393     debug      p6_vadeq_subsheds_YM3_6430_6620      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3394     debug      p6_vadeq_subsheds_YM4_6620_0001      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3385     debug             p6_vadeq_subsheds_N51033      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3386     debug             p6_vadeq_subsheds_N51097      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3387     debug             p6_vadeq_subsheds_N51137      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3388     debug             p6_vadeq_subsheds_N51177      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3389     debug             p6_vadeq_subsheds_N51101      rob  RUNNING       0:43 365-00:00:00      1 deq2

Meteorology Creation and Land Run with slurm dependencies


### Behaviors
- to slurm or not to slurm:
  - NO: within `run_model` if args are included, use slurm otherwise just execute script?
  - YES: or standalone controller that calls `run_model`? 
    - best bet is standalone, since `run_model` is really the method to run a single, self-contained segment
  - see here for coding bash args https://www.redhat.com/sysadmin/arguments-options-bash-scripts
  - each object class, i.e. land, and river, must have a `get_dependencies()` function specific to the model domain to enable optimal SLURM Dependencies
  - Cache dates can be dependent on file last modified?

#### Land Use generation
- Test 1: treat each land segment as a batch
  - thus they can occupy different cores
  - and they don't depend on each other

segs=cbp get_landsegs P for i in $segs; do echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land prep" sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land prep done

- This queues them up, with 7 processes running simultaneously (not counting the 1 process from another model)

squeue squeue: error: NodeNames=deq2 Sockets=0 is invalid, reset to 1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 361 debug run_mode rob PD 0:00 1 (Resources) 362 debug run_mode rob PD 0:00 1 (Priority) 363 debug run_mode rob PD 0:00 1 (Priority) 364 debug run_mode rob PD 0:00 1 (Priority) 365 debug run_mode rob PD 0:00 1 (Priority) 366 debug run_mode rob PD 0:00 1 (Priority) 367 debug run_mode rob PD 0:00 1 (Priority) 368 debug run_mode rob PD 0:00 1 (Priority) 369 debug run_mode rob PD 0:00 1 (Priority) 370 debug run_mode rob PD 0:00 1 (Priority) 371 debug run_mode rob PD 0:00 1 (Priority) 354 debug run_mode rob R 0:00 1 deq2 355 debug run_mode rob R 0:00 1 deq2 356 debug run_mode rob R 0:00 1 deq2 357 debug run_mode rob R 0:00 1 deq2 358 debug run_mode rob R 0:00 1 deq2 359 debug run_mode rob R 0:00 1 deq2 360 debug run_mode rob R 0:00 1 deq2 235 debug vadeq_20 rob R 3:45:05 1 deq2

- And they complete in under 10 seconds, maybe 30 land segments times 40 land uses

squeue squeue: error: NodeNames=deq2 Sockets=0 is invalid, reset to 1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 235 debug vadeq_20 rob R 3:43:23 1 deq2


### Land Runoff Modeling
- this could be a real time saver
- Each set of 40+ land uses for a land segment are run in an `sbatch`, and 8 batches can run simultaneously.

segs=cbp get_landsegs P for i in $segs; do echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land " sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land done

- Jobs complete without incident, with all indications that they are on time
  - `tail -f slurm-377.out`
  - Note: the messages below "No model plugins found in /opt/model/meta_model/models/hspf_cbp6/land/link/" is legit, there are no `link` or `analyze` plugins at this time.

tail -f slurm-377.out

running stf for segment N51610 land scenario vadeq_2021 '/opt/model/p6/vadeq/config/blank_wdm/land.wdm' -> 'stfN51610.wdm' count = 1 running sho for segment N51610 land scenario vadeq_2021 '/opt/model/p6/vadeq/config/blank_wdm/land.wdm' -> 'shoN51610.wdm' count = 1 No model plugins found in /opt/model/meta_model/models/hspf_cbp6/land/link/ No model plugins found in /opt/model/meta_model/models/hspf_cbp6/land/analyze/


### Advanced `slurm`
#### Parent slurm job wait for `slurm`ed children to complete.
- `sbatch ~/tmp/ec.sh`
- `slurmq` shows parent job and 4 children
- after ~10 secs parent job reports "I am finished." and all child jobs are done.

**Code 5:*** File `ec.sh` used for `slurm` `wait` example from SO.  Based on https://stackoverflow.com/questions/46427148/how-to-hold-up-a-script-until-a-slurm-job-start-with-srun-is-completely-finish

!/bin/bash

set -e date

for((i=0; i<5; i++)); do sbatch -W --wrap='echo "hello from $SLURM_ARRAY_TASK_ID"; sleep 10' & done; wait

date echo "I am finished"

rburghol commented 2 years ago

Needed met for Non VA P segments

Check the land

segs=`cbp get_landsegs P`
for i in $segs; do
  echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto river"
  sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto river 
done

Run the land

segs=`cbp get_landsegs P`
for i in $segs; do
  echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land "
  sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land 
done
rburghol commented 2 years ago

Headwater in Occoquan, PL2_4970_5250:

rburghol commented 1 year ago

Test:

rburghol commented 3 months ago

Ran for RU2_5940_6200, oth vadeq_2022 and subsheds (new test of met methods)

basin=RU2_5940_6200
scenario=subsheds
# run land
segs=`cbp get_landsegs $basin`
location="p6_vadeq" # crucial to avoid collisions if running multiple installs
read -r scope land_scenario < <(cbp get_config $scenario river "LAND SCENARIO")
for i in $segs; do
  job_name="${location}_${scenario}_${i}"
  echo "sbatch --job-name=\"$job_name\" /opt/model/meta_model/run_model $model $land_scenario $i auto land "
  sbatch --job-name="$job_name" /opt/model/meta_model/run_model $model $land_scenario $i auto land 
done
# run river
# can now immediately be submitted after submitting land 
# since the dependency ordering works with land and river 
segs=`cbp get_riversegs $basin`
for i in $segs; do
  deps=`cbp mm_river_deps $scenario $i`
  echo "Segment $i Found deps= $deps"
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i  auto river"
  sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i auto river
done
rburghol commented 1 month ago

Roanoke Tests

model=hspf_cbp5
scenario=p532sova_2021
basin=OR2_7900_7740
scenario=p532sova_2021
# run land
segs=`cbp get_landsegs $basin`
location="p532" # crucial to avoid collisions if running multiple installs
read -r scope land_scenario < <(cbp get_config $scenario river "LAND SCENARIO")
for i in $segs; do
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=\"$job_name\" /opt/model/meta_model/run_model $model $land_scenario $i auto land "
  sbatch --job-name="$job_name" /opt/model/meta_model/run_model $model $land_scenario $i auto land 
done
# run river
# can now immediately be submitted after submitting land 
# since the dependency ordering works with land and river 
segs=`cbp get_riversegs $basin`
for i in $segs; do
  deps=`cbp mm_river_deps $scenario $i`
  echo "Segment $i Found deps= $deps"
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i  auto river"
  sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i auto river
done
rburghol commented 1 week ago

Land tests with hsp2 for baseflow simulation checks

Config

. hspf_config
land_scenario=hsp2_2022
model=hsp2_cbp6
i=N51165 
i=N51660 # try another

Run the met wdm creation

met_scenario=`cbp get_config $land_scenario river "PRECIP ATMOS DEPOSITION"`
job_name="${i}_wdm"
echo "sbatch --job-name="$job_name" /opt/model/meta_model/run_model raster_met $met_scenario $i auto wdm "
sbatch --job-name="$job_name" /opt/model/meta_model/run_model raster_met $met_scenario $i auto wdm 

Run the land

. hspf_config # note: we need to do this after running the wdm workflow for some reason
/opt/model/meta_model/run_model $model $land_scenario $i auto land