NCAR / wrfcloud

WRF Cloud Framework
Apache License 2.0
15 stars 6 forks source link

Document cpus estimates/requirements in UG #119

Closed fossell closed 1 year ago

fossell commented 1 year ago

Describe the Enhancement

New model configurations may fail with WRF and UPP if over/under decomposed. Can't always use 96 cores by default, especially with small coarse domains. Users will be given the option to custom set the number of cores on the UI when creating a new model configuration. Do we let the forecast fail and then change the cores? Do we try an auto-compute function? How do we determine/guide this selection?

Time Estimate

2 days


Consider breaking the enhancement down into sub-issues.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Define the Metadata



Projects and Milestone

Enhancement Checklist

fossell commented 1 year ago

WRF support team general guidelines and has an FAQ and python script to help determine number of cores.

A good STARTING PLACE is to use the following equations:

For your smallest-sized domain: ((e_we)/25) * ((e_sn)/25) = most amount of processors you should use

For your largest-sized domain: ((e_we)/100) * ((e_sn)/100) = least amount of processors you should use

and then play around with it from there to see if you can find a good balance for the domain set-up you’re using, checking the decomposition and number of tiles. Keep in mind this is just a rule-of-thumb, so you may be able to pick something at the far end of one of those 2 values, or somewhere right in the middle. You may also be able to outside those boundaries, depending on the decomposition. Each run is a little bit different.

fossell commented 1 year ago

Python script from WRF support to help determine number of cores.

# This script finds the largest number of processors and nodes
# you can use, based on the number of grid points in the i/j directions
# on your domain. 

# Note: The largest number may not decompose the best way. If you want
# additional values, set some print statements in the code below

# enter the namelist values of e_we and e_sn
e_we = 200 
e_sn = 250

# number of cores you want to use per node (Cheyenne has a max of 36/node)
cores = 36 

# The value for 'cores' gets incremented later, so we want a static variable for the original value 
cores_orig = cores

# set upper limit of nodes - the max you want to loop through
node_max = 200 

# This is the least number of grid points allowed for each processor. 
# Dont' change this value.
smallest_size = 10

x = 1
while x <= node_max:

# finds the factor pairs for the total number of cores
    def f(cores):
        factors = []
        for i in range(1, int(cores**0.5)+1):
            if cores % i == 0:
                factors.append((i, cores/i ))
        return factors

    factors = f(cores)

# Of the factor pairs, this finds the closest values (pair) in that array
    closest_factors = factors[-1]

# Of the set of closest values, assign the i and j values
    i_array_value = closest_factors[0]
    j_array_value = closest_factors[-1]

# Calculate how the domain will be decomposed
    e_we_decomp = int(e_we / i_array_value )
    e_sn_decomp = int(e_sn / j_array_value )

# Once the decomposition becomes smaller than the least number of grid points
# allowed for each processor, the loop will quit and display the max 
# number of processors and nodes you can use for your domain.
    if ((e_sn_decomp < smallest_size) or (e_we_decomp < smallest_size)):

# test to see if the max number of processors allowed is within the number for a single node 
        initial_factor_pair = factors[0]
        initial_factor = initial_factor_pair[-1]
        if initial_factor == cores_orig:

# start with value of cores_orig and decrease by 1 for each iteration
# until the value is allowed
           y = cores_orig
           while y >= 1:
                processors = y

# finds the factor pairs for the total number of processors
# still testing processor values for a single node
                def f(processors):
                    factors = []
                    for i in range(1, int(processors**0.5)+1):
                        if processors % i == 0:
                            factors.append((i, processors/i ))
                    return factors

                factors = f(processors)

# Of the factor pairs, this finds the closest values (pair) in that array
# still testing processor values for a single node
                closest_factors = factors[-1]

# Of the set of closest values, assign the i and j values
# still testing processor values for a single node
                i_array_value = closest_factors[0]
                j_array_value = closest_factors[-1]

# Calculate how the domain will be decomposed
# still testing processor values for a single node
                e_we_decomp = int(e_we / i_array_value )
                e_sn_decomp = int(e_sn / j_array_value )

# Once the decomposition becomes larger or equal to the least number of grid points
# allowed for each processor, the loop will quit and display the max 
# number of processors and nodes you can use for your domain.
                if ((e_sn_decomp >= smallest_size) and (e_we_decomp >= smallest_size)): 
                    max_procs = (i_array_value * j_array_value)
                    print "max # of processors that can be used is: ", max_procs
                    print "max # of nodes that can be used is 1 "

# if you haven't reached your limit, the loop continues
# still testing processor values for a single node   
                    y -= 1

# if the size of the domain allows multiple nodes
            max_procs = (i_array_value * j_array_value) - cores_orig
            max_nodes = (max_procs / cores_orig)
            print "max # of processors that can be used is: ", max_procs
            print "max # of nodes that can be used is: ", max_nodes

# If you haven't reached your limit, the loop continues    
    x += 1
    cores = (cores+cores_orig)
fossell commented 1 year ago

It seems like we could implement a computation for a starting point or first guess at core needs.

fossell commented 1 year ago

UI will have "auto compute" set/checked by default. Currently that is set to 96 cores, but placeholder exists to add in code to auto compute based on namelist.wps settings. In the meantime, users can uncheck the auto compute and select a number of cores, using the UG documentation as guidance. Issue #141 details the adding of code to autocompute.

fossell commented 1 year ago

Closed with #174 .