HARPgroup / meta_model

0 stars 0 forks source link

Run with slurm #3

Open rburghol opened 1 year ago

rburghol commented 1 year ago

This is very rough draft prototype examples. Todo:

Cancelling jobs:

sjobs=`squeue --format="%.4i"`
for i in $sjobs; do scancel $i; done

View slurm Jobs with Full Names

. hspf_config
basin=JA5_7480_0001
segs=`cbp get_riversegs $basin`
scenario="subsheds"
model="hspf_cbp6" # alt hsp2_cbp6, etc.

River slurm Dependency

segs=`cbp get_riversegs $basin`
for i in $segs; do
  deps=`cbp mm_river_deps $scenario $i`
  echo "Segment $i Found deps= $deps"
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i  auto river"
  sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i auto river
done

River Upstream & Land Simulation Dependencies

Trigger the job with:

# run land
segs=`cbp get_landsegs $basin`
location="p6_vadeq" # crucial to avoid collisions if running multiple installs
read -r scope land_scenario < <(cbp get_config $scenario river "LAND SCENARIO")
for i in $segs; do
  job_name="${location}_${scenario}_${i}"
  echo "sbatch --job-name=\"$job_name\" /opt/model/meta_model/run_model $model $land_scenario $i auto land "
  sbatch --job-name="$job_name" /opt/model/meta_model/run_model $model $land_scenario $i auto land 
done
# run river
# can now immediately be submitted after submitting land 
# since the dependency ordering works with land and river 
segs=`cbp get_riversegs $basin`
for i in $segs; do
  deps=`cbp mm_river_deps $scenario $i`
  echo "Segment $i Found deps= $deps"
  job_name=`cbp mm_job_name $scenario $i`
  echo "sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i  auto river"
  sbatch --job-name=$job_name $deps /opt/model/meta_model/run_model $model $scenario $i auto river
done

Show the Dependencies

             JOBID PARTITION                                 NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
              3390     debug      p6_vadeq_subsheds_YM1_6370_6620      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3391     debug      p6_vadeq_subsheds_YM2_6122_6120      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3392     debug      p6_vadeq_subsheds_YM2_6120_6430      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3393     debug      p6_vadeq_subsheds_YM3_6430_6620      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3394     debug      p6_vadeq_subsheds_YM4_6620_0001      rob  PENDING       0:00 UNLIMITED      1 (Dependency)
              3385     debug             p6_vadeq_subsheds_N51033      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3386     debug             p6_vadeq_subsheds_N51097      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3387     debug             p6_vadeq_subsheds_N51137      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3388     debug             p6_vadeq_subsheds_N51177      rob  RUNNING       0:43 365-00:00:00      1 deq2
              3389     debug             p6_vadeq_subsheds_N51101      rob  RUNNING       0:43 365-00:00:00      1 deq2

Meteorology Creation and Land Run with slurm dependencies


### Behaviors
- to slurm or not to slurm:
  - NO: within `run_model` if args are included, use slurm otherwise just execute script?
  - YES: or standalone controller that calls `run_model`? 
    - best bet is standalone, since `run_model` is really the method to run a single, self-contained segment
  - see here for coding bash args https://www.redhat.com/sysadmin/arguments-options-bash-scripts
  - each object class, i.e. land, and river, must have a `get_dependencies()` function specific to the model domain to enable optimal SLURM Dependencies
  - Cache dates can be dependent on file last modified?

#### Land Use generation
- Test 1: treat each land segment as a batch
  - thus they can occupy different cores
  - and they don't depend on each other

segs=cbp get_landsegs P for i in $segs; do echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land prep" sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land prep done

- This queues them up, with 7 processes running simultaneously (not counting the 1 process from another model)

squeue squeue: error: NodeNames=deq2 Sockets=0 is invalid, reset to 1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 361 debug run_mode rob PD 0:00 1 (Resources) 362 debug run_mode rob PD 0:00 1 (Priority) 363 debug run_mode rob PD 0:00 1 (Priority) 364 debug run_mode rob PD 0:00 1 (Priority) 365 debug run_mode rob PD 0:00 1 (Priority) 366 debug run_mode rob PD 0:00 1 (Priority) 367 debug run_mode rob PD 0:00 1 (Priority) 368 debug run_mode rob PD 0:00 1 (Priority) 369 debug run_mode rob PD 0:00 1 (Priority) 370 debug run_mode rob PD 0:00 1 (Priority) 371 debug run_mode rob PD 0:00 1 (Priority) 354 debug run_mode rob R 0:00 1 deq2 355 debug run_mode rob R 0:00 1 deq2 356 debug run_mode rob R 0:00 1 deq2 357 debug run_mode rob R 0:00 1 deq2 358 debug run_mode rob R 0:00 1 deq2 359 debug run_mode rob R 0:00 1 deq2 360 debug run_mode rob R 0:00 1 deq2 235 debug vadeq_20 rob R 3:45:05 1 deq2

- And they complete in under 10 seconds, maybe 30 land segments times 40 land uses

squeue squeue: error: NodeNames=deq2 Sockets=0 is invalid, reset to 1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 235 debug vadeq_20 rob R 3:43:23 1 deq2


### Land Runoff Modeling
- this could be a real time saver
- Each set of 40+ land uses for a land segment are run in an `sbatch`, and 8 batches can run simultaneously.

segs=cbp get_landsegs P for i in $segs; do echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land " sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land done

- Jobs complete without incident, with all indications that they are on time
  - `tail -f slurm-377.out`
  - Note: the messages below "No model plugins found in /opt/model/meta_model/models/hspf_cbp6/land/link/" is legit, there are no `link` or `analyze` plugins at this time.

tail -f slurm-377.out

running stf for segment N51610 land scenario vadeq_2021 '/opt/model/p6/vadeq/config/blank_wdm/land.wdm' -> 'stfN51610.wdm' count = 1 running sho for segment N51610 land scenario vadeq_2021 '/opt/model/p6/vadeq/config/blank_wdm/land.wdm' -> 'shoN51610.wdm' count = 1 No model plugins found in /opt/model/meta_model/models/hspf_cbp6/land/link/ No model plugins found in /opt/model/meta_model/models/hspf_cbp6/land/analyze/

rburghol commented 1 year ago

Needed met for Non VA P segments

Check the land

segs=`cbp get_landsegs P`
for i in $segs; do
  echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto river"
  sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto river 
done

Run the land

segs=`cbp get_landsegs P`
for i in $segs; do
  echo "sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land "
  sbatch /opt/model/meta_model/run_model hspf_cbp6 vadeq_2021 $i auto land 
done
rburghol commented 1 year ago

Headwater in Occoquan, PL2_4970_5250:

rburghol commented 1 year ago

Test: