Speed up simulation run time

TralhaDoBruno commented 4 years ago

Hello, I don't know if I should ask this question here since it is possible not an issue but here it goes:

I must develop two pressure sensitivity matrix, which requires a considerable number of hydraulic simulations (213 674 hydraulic simulations in total). The hydraulic model has 4429 Nodes and 4474 Pipes. There are about 2200 individual node demand patterns and a 24h period (1h each step).

I've developed the code in Matlab using Open Water Analytics Matlab Toolkit and it takes me about 7 hours for each matrix. I've tried using the WNTRSimulator for the same purpose but it takes about five times longer (in a simpler network). With the EpanetSimulator it "only" takes 3 times longer, but I can't simulate leaks with a pressure-driven analysis (which I will need in the future, leading me to the WNTRSimulator).

As such, my question is: Is this difference in calculation times normal or expected when using Python vs Matlab or am I doing something wrong?

Thank you so much and continue with the great work!

dbhart commented 4 years ago

So, the short answer is that yes, Python is slower than Matlab (in general).

The longer answer is: I'm not sure why the EpanetSimulator is that much slower. The WNTRSimulator will be slower, always, but the EpanetSimulator should not be factors of n longer (maybe 125% of the Matlab time, but not 300%).

That said, we are in the process of releasing an update to WNTR that includes the EPANET 2.2 libraries which support PDD in EPANET. You can check the epanet22 branch on my fork (dbhart/WNTR) for the immediate term, or I believe the PR is up on the dev branch of the USEPA/WNTR.

I hope this is helpful, and that the PDD mode of EPANET 2.2 maybe will help solve the timing issue.

TralhaDoBruno commented 4 years ago

So, the short answer is that yes, Python is slower than Matlab (in general).

The longer answer is: I'm not sure why the EpanetSimulator is that much slower. The WNTRSimulator will be slower, always, but the EpanetSimulator should not be factors of n longer (maybe 125% of the Matlab time, but not 300%).

That said, we are in the process of releasing an update to WNTR that includes the EPANET 2.2 libraries which support PDD in EPANET. You can check the epanet22 branch on my fork (dbhart/WNTR) for the immediate term, or I believe the PR is up on the dev branch of the USEPA/WNTR.

I hope this is helpful, and that the PDD mode of EPANET 2.2 maybe will help solve the timing issue.

Thank you for your reply! I will check the updated branches. Nonetheless, and since I'm aiming at reducing the simulation time, do you recommend me to run the simulations using directly the EPANET toolkit in C?

Thank you once again for your help!

dbhart commented 4 years ago

Running in C/C++ is obviously going to be the fastest way, but using C/C++ with the toolbox is, obviously, a much higher overhead since you have to write and compile C code. I would actually suggest that you use parallel processing if possible – most machines are sufficiently multi-threaded that you can use the multiprocessing tools in Python to launch multiple simulations at once. I believe there is an example in WNTR on how to do that.

A related option if you can do what you want with EPANET 2.2 PDD, is to use the multiprocess tools in Python to call shell commands that run the epanet command-line executable directly. This will be the absolute fastest way to run EPANET. That way you can set up the network using WNTR, write an INP file, launch a shell command to run EPANET, and then read the results using the wntr.epanet.io.BinFile reader; then do the rest of the processing you need with the matrix you are setting up.

Hope this is helpful!

TralhaDoBruno commented 4 years ago

Thank you for the prompt reply. It has been a great help!

I will look for the example and try both suggestions.

TralhaDoBruno commented 4 years ago

I've looked in the repository and I couldn't find the example you mentioned regarding parallel processing. I'm trying to parallelize each timestep (of a 1h steps across 24h) to different processors but I'm having some difficulties (sometimes random timesteps are lost and I can't understand the cause). The example would surely be a great help.

I attach the code that I am currently using.

Thank you once again!


wn = wntr.network.model.WaterNetworkModel('Net1.inp')

def simulation(i, t,wn):
    wn.options.time.report_start = t[i]
    wn.options.time.duration = t[i]
    wn.options.time.start_clocktime = t[i]
    simul = wntr.sim.EpanetSimulator(wn)
    res = simul.run_sim()
    pressure = res.node["pressure"].values.tolist()
    pressure = pressure[0]
    return {i: pressure}

def get_result(result):
    global results
    # print(result)
    results.append(result)

if __name__ == '__main__':
    t = [0] * 24
    for row in range(len(t)):
        t[row] = 3600 * row
    print(t)

    results = []        
    pool = mp.Pool(mp.cpu_count())

    for i in range(len(t)):
        pool.apply_async(simulation, args=(i, t,wn), callback=get_result)

    pool.close()
    pool.join()

kaklise commented 4 years ago

The CriticalityMaps package, https://github.com/pshassett/CriticalityMaps, uses WNTR and includes multiprocessing. I suggest looking at the example mentioned in #132.

TralhaDoBruno commented 4 years ago

Thank You so much kaklise, I will check the Criticality Maps package.

dbhart commented 4 years ago

Yeah, I looked again and apparently that example was removed some time back (we were cleaning things up). I’ll see if I can find a copy of it, but in the meantime, I suggest looking at examples for the Python multiprocessing package.

As for parallelizing each timestep, I’m not sure that will work the way you would like, since, even though each timestep is a steady-state calibration, the boundary conditions for that solution depend on the previous timestep.

DanielHabenicht commented 1 year ago

I made some interesting observations while running multiple one-year simulations with 5-minute timesteps in parallel (with Pool(cpu_count()-1):

On an 8 CPU 32GB RAM machine the CPU is mostly idling (only spiking sporadically), so what could it be bound by? RAM-Cache?
Changing to a higher tier machine (16 CPU 64GB RAM) did not improve, but slowed down the processing (so there seems to be some kind of lock?)

Any ideas how one could go beyond the limits?

michaelbynum commented 1 year ago

I would recommend running in parallel with mpi4py.

USEPA / WNTR

Speed up simulation run time #148