Open dhardestylewis opened 2 years ago
memory should be less of an issue this round because we are running roughly half as many subgrids per node in parallel as previous run. Previous run lost 7/272 subgrids to memory errors
estimated to cost 14,000 SUs
these are just the data tiles, not viz tiles
align tiles with quarter quads
tldr; new gridding scheme
ref; old gridding scheme
scratch sheet for reference to set up new griddings on different systems
general approach to estimate maximum number of compute threads (ie CPUs) available for mass parallelization
50 jobs available max in Stampede2's
normal
queue (wait-time 40 hours) minus 2 jobs left open forlong
queue (wait-time 7 seconds) equals 48 jobs on Stampede2normal
queue64 maximum recommended cores per node on
normal
queue69 maximum available nodes per job (topping off at requesting 69*48=3312 nodes out of 3360 total
normal
nodes available -- we want fewer because there will likely be a small number of nodes down at any given time, ~10)48 jobs 64 cores per node 69 nodes per job = 211968 cores available, one for each separate computational grid
Deriving computational grid from parallel threads available
sqrt(211968) ~= 460
, so gridding will be 460x460 because this is simple, close enough, less than the total number of cores available, and won't add any significant amount of time to the overall computeDeriving tiles per compute grid
note the rest of these estimates are derived from previous computational results and assume minimal overhead & linear scaling between computational runs
If new total computational envelope of Texas is similar to previous envelope and adjusting for the smaller width x height of each new tile, there will be an estimated 175x175 tiles within each subgrid. (New envelope will be slightly larger because will rely on HUC8s instead of HUC12s)
Estimating compute time under new computational grid
Previous gridding took 120 hours to retile 71 billion pixels within each subgrid, new gridding estimated to take 4 hours to retile 2 billion pixels within each new subgrid