Memory optimization for large batch runs on PCs

jmaguire1 commented 3 months ago

Not everyone has a supercomputer like we do that they can use to run big batches of simulations. Andrew Poerschke at IBACOS has been trying to do some rather large runs and had this comment:

"Do you guys have any memory management strategies when running on the super computer? My workstation has 192 GB of memory, but keep filling it when running across all 32 cores. I am chunking and discarding dwelling objects with each batch. Is it just a factor when running 2-min timesteps?"

@apoerschke: If you have any tips this would be a good spot to put them, and then we'll add them into the documentation going forward.

apoerschke commented 3 months ago

Here are some notes from a recent batch of simulations. I was simulating ~350 homes with a few different parameters for a total of 1700 simulations. 2 minute timesteps and full annual simulations, saving 2-min data to parquet. Using os-hpxml1.7 branch: https://github.com/NREL/OCHRE/tree/e3f13d5164e923a65f6728dae107610c20f7df4d

Initially I tried to initialize all dwelling objects then run them using a multiprocessing.Pool() object. However memory use became and immediate issue as results are retained after a simulation. Then I switched to an approach where I initialized and ran batches of 30-60 homes at a time, this kept memory usage to an acceptable level, but still higher than ideal. The workstation I am using has a Core i9-14900K and 192 GB of RAM. Anyone who is looking to fully utilize multi-core processors will need to equip with as much RAM as possible. Comparatively E+ uses much less RAM per simulation core.

Memory: 140 GB when keeping 62 simulations in memory, and running 31 at a time.

Processor: Efficiency cores seem to have a big performance hit, 5 sims in parallel take 2.3 min, while 30 runs takes 5.3 min (CPU has 8P cores and 16 E cores with total of 32 threads)

Runtime: running all 1700 simulations took approx 5 hours.

apoerschke commented 3 months ago

Here are the functions I was using to handle the multiprocessing. Could be useful to anyone else doing large batches of simulations.

import multiprocessing as mp
import datetime as dt
import os
from ochre import Dwelling
import sys

def mute():
    #Mute ouitput from child processes to clean up console
    sys.stdout = open(os.devnull, 'w')

def start_batch(dwelling_pre_list):
    #Generate and run batch of dwellings based on a list of dwelling arguments
    #Should be pre-chunked to limit memory usage (30-60 dwellings)
    dwellings = gen_dwellings(dwelling_pre_list)
    dwellings = [d for d in dwellings if type(d)==tuple] #Filter out none type / error dwellings
    batch_metrics = run_dwellings(dwellings)
    return batch_metrics

def run_dwellings(dwellings,processors=None,return_df=True):
    #Primary call to create mp pool and iterate over batch of dwellings
    # dwellings is a list of tuples: (dwelling arguments, dwelling)
    # arguments are passed so that the meta-data can be reatined when the job is done
    if processors == None:
        processors = max(1, mp.cpu_count() - 1) #Default to one less than max CPUs

    pool = mp.Pool(processors, initializer=mute)
    result = pool.map(multirunner, dwellings) #Zip with return_df argument
    pool.close()

    return result

def multirunner(dwelling):
    #Handles multiple arguments to run call
    result = run(dwelling[0],dwelling[1])
    return result

def run(dwelling_args,dwelling, return_df=False):
    #Run the dwelling object and return results. 
    # Try statement to handle failed sims during large batches.
    key=dwelling_args['parameters']
    key.update({'name':dwelling.name})
    try:
        df, metrics, hourly = dwelling.simulate()

        #dwelling.name
        if return_df:
            return df
        else:
            return {'key':key,'metrics':metrics}
    except:
        return {'key':key,'metrics':None}

def gen_dwellings(dwelling_pre_list,processors=None):
    #Primary call to create pool, iterate and generate dwelling objects based on a list of dwelling arguments
    if processors == None:
        processors = max(1, mp.cpu_count() - 1) #Default to one less than max CPUs

    pool = mp.Pool(processors, initializer=mute)
    result = pool.map(multigen, dwelling_pre_list)
    pool.close()

    return result

def multigen(dwelling_pre):
    #Generate dwellings, try statement to handle failed initializations.
    #  NOTE: This function also modifies some parameters specific to the simulation objective.
    try:
        kwargs = dwelling_pre[0]
        modifiers = dwelling_pre[1]
        dwelling_args = gen_dwellingargs(**kwargs)
        dwelling_args['parameters']={'hplo':modifiers['hplo'],'ero':modifiers['ero']}
        dwelling_args['name'] = kwargs['modelname']+'_HPLO-{}_ERO-{}'.format(modifiers['hplo'],modifiers['ero'])

        dwelling = Dwelling(**dwelling_args)

        #Modify parameters
        dwelling.equipment['ASHP Heater'].outdoor_temp_limit = (modifiers['hplo']-32)*5/9
        dwelling.equipment['ASHP Heater'].er_setpoint_offset = (modifiers['ero'])*5/9
        dwelling.equipment['ASHP Heater'].upper_deadband_override=False #Need to fix

        #Next modify heat pump capacity
        #dwelling.equipment['ASHP Heater'].er_capacity_rated = 10000

        #ER Capacity: 'er_capacity_rated'
        return (dwelling_args,dwelling)

    except:
        return None

def gen_dwellingargs(modelname, input_dir,run_dir,xml_file,schedule_file,output_dir,epoch,weather_file,days=30):

    dwelling_args = {
        'name': modelname,  # simulation name

        # Timing parameters
        'start_time': dt.datetime(2018, 1, 1, 0, 0),  # year, month, day, hour, minute
        'time_res': dt.timedelta(minutes=2),          # time resolution of the simulation
        'duration': dt.timedelta(days=days),           # duration of the simulation
        'initialization_time': dt.timedelta(days=5),  # used to create realistic starting temperature
        'time_zone': None,                            # option to specify daylight savings, in development

        # Input parameters - Sample building (uses HPXML file and time series schedule file)
        'hpxml_file': os.path.join(run_dir,xml_file),
        'schedule_input_file': os.path.join(run_dir,schedule_file),

        # Input parameters - weather (note weather_path can be used when Weather Station is specified in HPXML file)
        # 'weather_path': weather_path,
        'weather_file': os.path.join(input_dir,'WEATHER',weather_file),

        # Output parameters
        'verbosity': 3,                         # verbosity of time series files (0-9)
        #'metrics_verbosity': 9,               # verbosity of metrics file (0-9), default=6
        # 'save_results': False,                # saves results to files. Defaults to True if verbosity > 0
        'output_path': os.path.join(output_dir,epoch),           # defaults to hpxml_file path
        # 'save_args_to_json': True,            # includes data from this dictionary in the json file
        'output_to_parquet': True,              # saves time series files as parquet files (False saves as csv files)
        # 'save_schedule_columns': [],          # list of time series inputs to save to schedule file
        # 'export_res': dt.timedelta(days=61),  # time resolution for saving files, to reduce memory requirements

        # Equipment parameters
        'Equipment': {

        },

        # 'modify_hpxml_dict': {},  # Directly modifies values from HPXML input file
        # 'schedule': {},  # Directly modifies columns from OCHRE schedule file (dict or pandas.DataFrame)
    }

    return dwelling_args

jmaguire1 commented 3 months ago

Awesome, thanks @apoerschke! I'm going to look to add a new section to our documentation on setting up batch runs for external users, AFAIK you're the first one to really try this out.

We also haven't really tried doing anything to optimize for this situation, since we can just throw this on a supercomputer. But it's something I definitely would like to try to address as funding allows, for now we'll just provide the best guidance we can and keep this issue open until we get that opportunity.

mnblonsky commented 3 months ago

Agreed with Jeff, we know memory use is a big issue, but it hasn't been a priority. I believe we know the solution too - most of the memory is in the equipment schedules, which can be partitioned and saved to files for high resolution/long duration runs. We can bump that up our priority list.

One thing that you can do now to improve this is to stop returning the output data. That's save a lot of memory and allow larger batches to run at once. Then once all the runs have finished you can call Analysis.load_ochre, which will load the output files and return the data in the same way that dwelling.simulate would.

NREL / OCHRE

Memory optimization for large batch runs on PCs #126