RMG RAM usage - the "issue of old"

DetlevCM commented 7 years ago

Dear All,

RMG has always been a bit of a RAM hog - which can be a lesser or greater problem.

I was wondering if any plans have been drawn up to let RMG use proper temporary files (maybe with an explicit RAM allocation at the start such as used in Quantum Chemistry codes). This should be even more important given the option of running RMG in parallel.

mjohnson541 commented 7 years ago

So our primary method for dealing with RMG's RAM usage is pruning. Flux pruning looks at the maximum flux to edge species when simulations finish and removes low flux edge species and tries to maintain a maximum number of edge species. In our tests and our runs this has largely allowed us to avoid RAM issues. See http://reactionmechanismgenerator.github.io/RMG-Py/theory/rmg/prune.html

This has so far been sufficient for construction of large models in our tests so we don't currently have any active plans for additional methods of reducing RAM usage. Although we do have an experimental feature for pruning based on thermodynamics that doesn't require simulations to finish.

DetlevCM commented 7 years ago

The pruning definitely helps, however is not always an answer, depending on the initial input and available computing resources.

In the past, I had an input which turned out to be not feasible as a 64GB node in a computing cluster was insufficient. At my current position I do not have access to a computing cluster (though one large memory computing PC will be set up). Add to that, running RMG in parallel places additional requirements on the available RAM (if I remember it correctly from my test). The use of temporary files is also well established in codes that produce a lot of data - for example quantum chemistry codes.

mjohnson541 commented 7 years ago

There are two reasons that an input file would fail if you were using pruning:
-The simulations aren't running to completion often enough -You set the maximum number of edge species too high

The first is solved by increasing the interrupt tolerance. Although our experimental thermodynamic pruning has the benefit of not needing to worry about this to maintain the maximum number of edge species. The second is simply solved by reducing the maximum number of edge species, I believe the default 100,000 is the standard setting for about 16 GB of RAM, someone correct me if I'm wrong.

I think temporary files would be very reasonable if we have cases that need to do so much pruning that we are missing important species, however we haven't had any of those cases occur yet, because models that need that much pruning are usually so large that the runtime is prohibitive. We do understand the issue of not knowing what settings to use to avoid memory issues and thermodynamic pruning is our primary answer to that, although it's currently undocumented pending a current pull request. If you're interested in using that feature I can provide some documentation related to it. However, if you have input files where this does occur we would be very interested to see them.

DetlevCM commented 7 years ago

I don't have the input file that didn't work on the 64GB node any more as it remained with my former employer (unless it has been deleted). In this case, the input file contained a very large number of species and thus pruning never occurred in the first place.

As to usable RAM: Well, right now I only have access to a computer with 16GB of RAM (...I have a private laptop with 24...) - This will improve once the dedicated "computing box" is set up (which has 128GB as I am told.) In general, I would call 16GB prohibitive if one wants to run more than 1 job on a "regular computer - aggressive pruning is also not always an ideal solution. (And even on a cluster this is not particularly efficient, as one is forced to request entire compute nodes to satisfy the RAM requirements of RMG.)

Lastly, what do you call time prohibitive? The job that failed on the 64GB node ran for less than 10 days (that was the limit on the cluster). It is not particularly difficult to run RMG for a long amount of time - add to that, the parallelisation should also help with performance. Running RMG for a week or two is not inherently prohibitive - especially if the machines remain otherwise useful. If RMG were to run with say 1GB or 4GB of RAM, it could run in the background for a very long time.

(Which sort of makes me think of the need of proper checkpointing... In a way this exists in RMG, however unless things have changed, the initial step can take very long when runs are restarted.)

Edit: And I guess the obvious question that springs to my mind...

Did you ever test RMG with /swap on a PCIe SSD? - That seems to me to be the cheapest way of faking a very large amount of RAM...

mjohnson541 commented 7 years ago

Oh, yeah, the problem with too many species in the input file, and that's not easy to avoid the way you're running it. That was a motivating problem for Thermodynamic pruning and that is currently the only way to get around that.

The maximum edge species parameter is adjustable, you can increase it if you have more memory or decrease it if you have less. Can you give me some idea how large of molecules you are studying? We have some people that run large molecule jobs as long as 30 days, but I suspect our pruning benchmarks (comparisons with and without pruning) probably only go out to about C7 and probably <10 days. Sorry, by time prohibitive I mean that after a certain amount of time the rate at which species are added gets so slow it isn't really worth continuing the run. You can run RMG for a month, but it may not have many more species than running it for two weeks.

For the time being I wouldn't recommend running RMG in parallel, we've been slowly moving away from it for the time being because the speed up it provides is very limited and I would suspect that it is contributing to your RAM problem.

@KEHANG please jump in if you have anything to add.

We have a new better way of checkpointing using automatically generated "seed mechanisms" that don't have the problems we had with restart files. Basically you set generateSeedEachIteration=True in the options block and it will generate a core and edge seed each iteration of the model. When the job fails you take the last core seed and set it as a seed mechanism and the last edge seed and set it as a reaction library. However I would recommend cherry-picking commits from #1200 if you want to use this since there are a couple of small bugs with loading the new seeds. Unlike the restart files the affect on runtime is pretty negligible.

That's a very interesting idea, I don't think that's been tried before.

In terms of RMG currently. I think currently pruning is the only way we've tested to reduce RAM usage and if you are adding lots of species in the input file in a way that causes it to crash before simulation occurs I think thermodynamic pruning is currently the only algorithm based option. If simulation does occur I think you just need to set the interrupt tolerance for flux pruning to something very large like 1e8.

I'll take a look at what it would take to implement temporary file and bring it up at our next RMG meeting. It would certainly be a more robust way to generate large models and would allow us to set better pruning benchmarks for larger models. I'm a little bit concerned about how it would affect speed, however we have a number of people that work on large molecule pyrolysis models that might be interested in taking it on.

DetlevCM commented 7 years ago

Back at my previous employment, the idea was to create a combined mechanism to merge the decane mechanism of Karl with "my" toluene mechanism - and that is where we ran out of RAM at the start... (DOIs 10.1021/acs.energyfuels.5b02470 10.1021/acs.energyfuels.7b00416 ) Yes, the input files could be reduced which worked, but then we had the issue that the mechanism did not behave as expected and it was no investigated further in the end.

Also, thanks for clarifying the time spans involved. 30 days is long - but obviously not unfeasible in gerneral. Howeve so far, my experience with clusters had rather limited time scales. Back when I was in the UK, the cluster allowed only 2 days per job (this was in the time of RMG Java and it was faster to run it on my desktop... - which also had more RAM than the old Arc1 cluster before it was replaced...) Then at employment number 1 in France we had a limit of 10 days - which was generally enough. Right now, the dedicated compute machine will have no limit once it is set up. But there is also the subject of running multiple jobs side by side and not being stuck having to wait for a single job to fonish. (Especially if RMG is single threaded, it is inefficient if the entire process is so RAM bound that other cores sit idle as not everybody can afford to have a large number of high memory "compute nodes" that sit mainly idle.)

The subject I am dealing with now is also biomass pyrolysis. (UCP ENSTA, France.)

Its also unfortunate to hear that the parallelisation attempts didn't work out, it would have been nice if they did... I'll keep the suggestion with the core seed mechanism and reaction library at the back of my mind, as this might become neccessary. (For example if the sheduler decides to kill jobs due to a lack of RAM... happened to me one morning...)

As to temporary files: Sometimes the question is less what the direct performance hit is, but whether methods are usable. Coupled Cluster Calculations would be impossible without temporary files for example. And even DFT benefits from temporary files.

I think the best method to inplement temporary files would be to let the user set a RAM limit (similar to quantum chemistry codes) that is internally translated to some limit on edge species/reactions. Once the limit is reached, data is effectively "paged out" or "swapped out" to the temporary files. If this is implemented in the code itself, this can be a lot more efficient than letting the OS deal with it via Linux swap or Windows pagefiles. - However I'm not the right person to discuss the details of implementing such files as I have no experience in that specific field of programming. I know that it is generally said that binary files are faster - and I believe that the majority of temporary files are binary files (in quantum chemistry codes as well as kinetics codes), even if that means human cannot read the files themselves. Depending on the data that needs to be saved, it may make sense to use a form of relational database, but I have no idea if the performance would be adequate.

github-actions[bot] commented 1 year ago

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.

JacksonBurns commented 1 year ago

Use of temporary files is a very nice to have - putting this on the long-term board.

github-actions[bot] commented 1 year ago

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.

ReactionMechanismGenerator / RMG-Py

RMG RAM usage - the "issue of old" #1234