NOAA-OWP / ngen

Next Generation Water Modeling Engine and Framework Prototype
Other
84 stars 63 forks source link

Framework CONUS calculation load balancing #865

Open stcui007 opened 3 months ago

stcui007 commented 3 months ago

During the baseline calculations, I noticed a large distribution of CPU hours among the processing cores, from just over a hour to twice of that. This could possibly be related to the imparity of partition in the remote nexuses as seen from the output of partitionGenerator running on CONUS for 32 partitions:

Reading 817573 features from layer divides using ID column `divide_id`
Partitioning 817573 catchments into 32 partitions.
Reading 398822 features from layer nexus using ID column `id`
Validating catchments...
Number of catchments is: 817573
Catchment validation completed
Found 9 remotes in partition 0
Found 29 remotes in partition 1
Found 36 remotes in partition 2
Found 120 remotes in partition 3
Found 274 remotes in partition 4
Found 15 remotes in partition 5
Found 40 remotes in partition 6
Found 71 remotes in partition 7
Found 97 remotes in partition 8
Found 152 remotes in partition 9
Found 242 remotes in partition 10
Found 158 remotes in partition 11
Found 278 remotes in partition 12
Found 4 remotes in partition 13
Found 347 remotes in partition 14
Found 644 remotes in partition 15
Found 1319 remotes in partition 16
Found 277 remotes in partition 17
Found 386 remotes in partition 18
Found 879 remotes in partition 19
Found 865 remotes in partition 20
Found 47 remotes in partition 21
Found 136 remotes in partition 22
Found 216 remotes in partition 23
Found 210 remotes in partition 24
Found 501 remotes in partition 25
Found 33 remotes in partition 26
Found 6 remotes in partition 27
Found 5 remotes in partition 28
Found 39 remotes in partition 29
Found 46 remotes in partition 30
Found 57 remotes in partition 31
Found 7538 total remotes (average of approximately 235 remotes per partition)

The number of remotes ranges from 4 to 1319

Current behavior

Wide dsitribution of CPU hours from less than a hour to several hours

Expected behavior

More evenly distributed CPU times for all cores.

Steps to replicate behavior (include URLs)

  1. To generate the partition output, run the command: ./cmake_build/partitionGenerator hydrofabric/conus.gpkg hydrofabric/conus.gpkg test_partition_32.json 32 '' ''

Screenshots

Screenshot (66)