PMEAL / OpenPNM

A Python package for performing pore network modeling of porous media
http://openpnm.org
MIT License
453 stars 174 forks source link

Code scalability for super large networks #640

Closed ma-sadeghi closed 8 years ago

ma-sadeghi commented 8 years ago

Now that I am currently working with super large networks (up to 1 million pores), I have noticed that the code does not scale very well (or at least seems so). For instance, performing a certain series of actions* on a 50 by 50 by 50 network takes roughly 1-2 hours. However, performing the same actions on a 70 by 70 by 70 network takes 14 hours (and counting! since it's still running). Any thoughts?

* starting from a 100 by 100 by 100 network, and consecutively merging a selection of pores, and solving the reaction-diffusion for the modified network.

maghighi commented 8 years ago

My first thought is that you can use 'profile' to see how many times certain methods have been called, or as the simplest way, you can just measure the time required for each section (merge, geometry, physics, check health, algorithm) and then compare the required time for each section between a 30x30x30 and a 40x40x40. Then if we find the root of the problem, you may need to use a different logic for your run script to avoid the problem.

MichaelHoeh commented 8 years ago

I agree to Mahmoud to profile the code, there is a built-in profiler in Spyder. Or do a naive profiling by measuring the time it takes between individual steps to break it down. When I was running large simulations in the past, and running multiple simulation on the same network, I first created the network separately and saved it. Loading a network is of course much faster than creating a new one. In my case, network creation took a significant amount of time, and it was just cubic networks. In case the algorithm is the problem, go for iterative solvers. In the past (www.dx.doi.org/10.1021/acs.jpcc.5b04157, cf. fig. 7) I was only able to run simulations on 70x70x70 cubic networks using the direct solver (memory problems with 16GB RAM), but was fine to run 200x200x200 on the same machine, and found the exact same value. I used gmres from scipy.sparse.linalg. If you use cg, keep in mind that it needs a hermitian, positive definite matrix. gmres was also the fastest in my case, so I went with that. You can get them even faster if you have the proper preconditioner matrices inserted.