Open rwest opened 13 years ago
I ran into this issue when trying to dump and restart after the model had grown fairly large. Here are some observations.
It seemed to me that "jing/rxnSys/ReactionModelGenerator.reactionModel" is RMG's base class holding the reaction data. I checked the memory usage of reactionModel along with all the deep down references where the restart files are getting written.
long noBytes=MemoryUtil.deepMemoryUsageOf(reactionModel, VisibilityFilter.ALL);
Logger.info(String.format("RSA reactionmodel memory usage %10.3f MB", noBytes/1e6));
It was around 15% of the total memory usage reported. When the total memory footprint was 800MB, reactionModel was ~120MB and the size of the restart files was 40MB. Binary I/O on 120MB would be way faster than formatted I/O on 40MB. May be there is an object/container smaller than 120MB that contains the necessary and sufficient data to be able to restart.
thanks for this, this is very interesting.
I agree that the Restart functionality could have been serialized in the first place.I believe it is actually done in RMG-Py through pickle now, for the reasons you mention.
On top of that, "restart" code had to be updated each time a new features were introduced in other parts of the code. I think of P-dep kinetics formats.
One pro of "restart" ís actually the human readability, where you can easily inspect the edge without hacking into code. I believe this is still the only way to do so.
Also, the fact that the format of the "restart" files closely ressembles the format seed mechs, is a nice-to-have, if suddenly "restart" functionality does not behave as it should! :)
Binary restart sounds logical to me if it can be straightforwardly implemented (a bold assumption).
Personally, I seldom use the restart feature, because usually if a job dies there's a reason for it and I don't want to restart it the same way.
Besides Nick's points, another feature we may risk losing (if indeed it exists) is the ability to change the conditions or settings from one run to the next. I'm not sure if this is widely used (I think @mrharper may have used it, but according to [1] he used the Seed Mechanism feature instead - although as Nick says, these share a lot of code). So maybe this is not a problem.
[1] Harper, M. R.; Van Geem, K. M.; Pyl, S. P.; Marin, G. B.; Green, W. H. “Comprehensive reaction mechanism for n-butanol pyrolysis and combustion.” Combust. Flame 2011, 158, 16–41 doi:10.1016/j.combustflame.2010.06.002.
This was mentioned in the meeting today. Specifically, there was talk of assuming that all restart files are well-formed and have unique reactions, so that you don't need to check them all, which apparently is slow. (Or perhaps this process could be sped up some other way - I have not looked into it).