NCAR / wrfcloud

WRF Cloud Framework
Apache License 2.0
15 stars 6 forks source link

Feature #141 estimate optimal core count #190

Closed georgemccabe closed 1 year ago

georgemccabe commented 1 year ago

Also fixed bug where number of cores is not saved as 0 when "Set automatically" is checked.

Expected Differences

Pull Request Testing

Added unit tests to ensure core number estimates are as expected. Started a cluster and ran jobs that use configs that auto compute number of cores to ensure that logic is being executed properly.

[ec2-user@ip-172-31-28-94 ~]$ wrfcloud-run --job-id W146C654FDD

{"time": "2023-04-10 18:59:35.000 +0000", "message": "Initializing cli environment", "appName": "no_name", "className": "NoClass", "level": "INFO "} 2023-04-10 18:59:35.000 +0000 - INFO - wrfcloud-cli - NoClass - Starting new run "W146C654FDD" 2023-04-10 18:59:35.000 +0000 - INFO - wrfcloud-cli - NoClass - Setting up working directory /data/W146C654FDD 2023-04-10 18:59:36.000 +0000 - INFO - wrfcloud-cli - WrfConfig - Estimate core count: 102 2023-04-10 18:59:36.000 +0000 - INFO - wrfcloud-cli - NoClass - Using 96 cores 2023-04-10 18:59:36.000 +0000 - INFO - wrfcloud-cli - NoClass - Updating job status W146C654FDD Done

[ec2-user@ip-172-31-28-94 ~]$ wrfcloud-run --job-id W83C165E19E

{"time": "2023-04-10 18:59:42.000 +0000", "message": "Initializing cli environment", "appName": "no_name", "className": "NoClass", "level": "INFO "} 2023-04-10 18:59:42.000 +0000 - INFO - wrfcloud-cli - NoClass - Starting new run "W83C165E19E" 2023-04-10 18:59:42.000 +0000 - INFO - wrfcloud-cli - NoClass - Setting up working directory /data/W83C165E19E 2023-04-10 18:59:43.000 +0000 - INFO - wrfcloud-cli - WrfConfig - Estimate core count: 297 2023-04-10 18:59:43.000 +0000 - INFO - wrfcloud-cli - NoClass - Using 96 cores 2023-04-10 18:59:43.000 +0000 - INFO - wrfcloud-cli - NoClass - Updating job status W83C165E19E Done

Configs Upper-midwest_3km_test_auto_compute_cores and caribbean_6km_test_auto_compute_cores can be used to test.

Once other tests are fixed, we could consider adding another test to read from a WPS namelist and ensure the correct core estimate is computed. Also, when we support mulitple domains, more tests could be added to ensure the correct result occurs. The unit tests currently only assume a single domain.

Pull Request Checklist

georgemccabe commented 1 year ago

@fossell, I just deployed the web changes so you should be able to save the "set automatically" setting.