ReactionMechanismGenerator / RMG-Py

Python version of the amazing Reaction Mechanism Generator (RMG).
http://reactionmechanismgenerator.github.io/RMG-Py/
Other
382 stars 226 forks source link

RMG Improvement Proposal [RIP]: Make `ReactionMechanismSimulator.jl` Optional [COMMENTS WELCOME!] #2684

Open JacksonBurns opened 3 months ago

JacksonBurns commented 3 months ago

[!IMPORTANT]
Please read and comment on this issue if you are a developer or user of RMG - we need lots of input!

The purpose of this issue is to centralize discussion and hopefully reach a compromise surrounding a topic which has been discussed throughout many issues, pull requests, and offline meetings - whether ReactionMechanismSimulator.jl and its associated Julia dependencies should be required to run RMG.

This issue is styled after the Python Enhancement Proposal (see PEP 733 for an example), thus the name 'RMG Improvement Proposal', or RIP for short. This RIP will attempt to:

Summary

Introduction

As of writing, RMG-Py currently requires RMS and its associated dependencies to be installed in order to run anything in RMG. This is because (a) RMG can optionally interface with RMS to use its reactors and (b) Arkane optionally uses Julia's differential equations solver.

Reasons to Require RMS

I'm attempting to summarize comments from #2631 (especially here) and elsewhere. Please leave comments replying to any of these points, or add your own, if you think I missed or misrepresented anything.

The 'Batteries Included' Ideology

In general Python adheres to a "Batteries Included" thinking - everything that you could possibly need should come included without additional installation. By including RMS with RMG by requirement, we are avoiding the questions of "what features are available if I add RMS?" and the like.

Counterpoint

We can continue to ship the Docker container with 'batteries included' (all dependencies) but allow for RMS-less source or binary installs. The CI will also continue to test RMS and future PRs will require RMS tests to pass, so there is no risk of incompatible code being merged.

Promised Features in Papers are only in RMS

Some recent papers coming out of our groups describe features which are only available via RMS. It makes sense that the default installation of RMG should therefore provide these features. It's also an additional layer of 'protection' for these features to make them required by default - by making them required and continuing to interact with them, we stand a better chance of them continuing to work into the future.

Simplifying Debugging by Forcing Everyone to Install the Same Way

Offering multiple installation setups can lead to more complicated bug reports. If users install with our without Julia they may end up running a different set of unit tests, for example. This is a similar line of thinking to why we moved to Docker in the first place - unify the user experience.

Reasons to Make RMS Optional

Resolves Outstanding Issues

There are issues scattered across the @ReactionMechanismGenerator organization which are related to these dependencies that would be resolved by making them optional:

Counterpoint

These issues would also be resolved by merging part of #2631 - only the "avoid unnecessary Julia operations".

Enables Conda Binary Building

Currently, our conda binaries are significantly (~3 years) out of date because we have not been able to successfully build the rmg package since the integration of RMS. Users must either do a source install or use the Docker image.

This would be resolved by #2641 already - by making a pure-python RMG package and then having users install RMS on top of it, the conda build for RMG is massively simplified and shown to work. The alternative w/o making RMS optional in https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2636 was not successful.

Health of the RMG Software

As far back as 2021 in this discussion (https://github.com/ReactionMechanismGenerator/RMG-Py/discussions/2247) we have had requests to make installing the Julia dependencies optional, citing that the features it provides, when unused, amount to bloat. Along the same lines of thinking, extending RMG is prohibitively difficult because of the huge number of dependencies besides RMS - taking chances to make dependencies optional will help alleviate this.

This is opinionated, but the separation between RMG and RMS is clean, amounting to ~300 lines of code in https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2631/files. To quote PEP 20: "Simple is better than complex", and separating these two halves is simple.

Highly Opinionated Point - Julia can be a bit of a headache

Julia is very new and consequently suffers a lot of ecosystem problems - a selection of some reported here:

Developing RMG with Julia suffers from problems as well:

Acknowledgements

Thanks to @rwest for the suggestion to formalize this discussion into this format.

cc

I'd specifically like to make sure that the following people see this issue and have an opportunity to respond: @mjohnson541 @hwpang @alongd @calvinp0 @rwest @sevyharris @oscarwumit @jonwzheng @xiaoruiDong @ChrisBNEU - please see the Summary above, consider reading the full issue, and let us know your thoughts!

rwest commented 3 months ago

Thanks for starting, Jackson. I was hoping for a more balanced summary, that might have been written by either "side" of the debate. Would you like to try role playing someone with the other opinion? I was hoping the pros and cons wouldn't be so tied to people

JacksonBurns commented 3 months ago

Happy to - I need some help brainstorming though. What are some other pros of keeping RMS required?

calvinp0 commented 3 months ago

I am not very well versed in the technical details of Python like everyone else is that are mentioned. I use RMG from time to time but uncertain whether my RMG runs are engaging RMS. However, I can say that the installation of RMS has become a little more tedious for our team. Since we run RMG on our university servers and they are CentOS (7.9.2009) based, they do not come with Zlib 1.9.2 installed. I came across the issue back in 30 October 2023 and eventually resolved it through forum searching. I know that Docker is preferred when using RMG as it will come with all the required packages and codecs but our servers don't support Docker.

Sorry, I realise that my response does not add anything to the conversion. I guess my position is that optional is preferred as users who require RMG may have to jump through hoops in order to get RMS to work. On the other hand, if there was a simpler way to install RMS and connect it to RMG that would also be good (although, I have no idea what that would look like or if there is a simpler way).

jonwzheng commented 3 months ago

In addition to the points mentioned above, here are some thoughts (full disclosure, I don't have experience with using RMS):

Additional points for making RMS Optional

Additional points against making RMS Optional

Overall thoughts

I lean toward making the RMS install optional to separate the Julia components as much as possible, but I would like to ensure that (1) our messaging of this separation to our users is very clear, (2) we have error catching early on so users will quickly realize they need to upgrade their install for their desired use cases (rather than having to wait to the end of a mechanism generation cycle), (3) the optional install process is easy and its inclusion does not overly complicate the testing and development process.

sevyharris commented 2 months ago

Thanks, Jackson, for getting the discussion started/organized.

This is mostly a repeat of what's already been said, but here are my thoughts on the issue:

For RMS Optional

Against RMS Optional

JacksonBurns commented 2 months ago

@calvinp0 thank you for the input! I had not thought about installation on platforms other than plain debian linux, which I use daily. I think the challenges of getting RMG to work on its own on different systems might be another compelling reason to separate these installs - at least that way, users would only need to do one slightly tedious installation and an additional slightly tedious installation if needed, rather than a mandatory very difficult one.

@jonwzheng thank you for the input also! You have some great points here.

@sevyharris and thank you for your input as well! Towards your last point, I really like the "batteries included"-ness we have right now as well.

RMG has a python liquid reactor: https://github.com/ReactionMechanismGenerator/RMG-Py/blob/36bceb35bc6b0d58beb2f2fa53faa12cf35c7034/rmgpy/solver/liquid.pyx and a Julia liquid reactor: https://github.com/ReactionMechanismGenerator/RMG-Py/blob/36bceb35bc6b0d58beb2f2fa53faa12cf35c7034/rmgpy/rmg/reactors.py#L515 and users can access both in their input file according to these names: https://github.com/ReactionMechanismGenerator/RMG-Py/blob/36bceb35bc6b0d58beb2f2fa53faa12cf35c7034/rmgpy/rmg/input.py#L1536-L1537

ChrisBNEU commented 2 months ago

I don't think I have much more to add to the discussion, but I am wondering if there is an option where we keep the RMS attached to rmg, we use something like PackageCompiler to turn rms into a c library. Most of the issues I have had with RMS have come from the Julia to Python tie in, and not the package itself. Just a thought. I didn't have time to go back and read all the related issues so it's possible this has already been discussed.

Overall, I am in favor of not having to use both python and julia in the same environment for my own overall quality of life/sanity. For the reasons specified above, it sounds like that involves decoupling rmg and rms. but if there is an alternative to that I am all ears.

JacksonBurns commented 2 months ago

Thanks for that suggestion! I imagine that using that repackager will be easier if we first separate RMG and RMS, too.

mjohnson541 commented 2 months ago

How we can eat our pie and have two pies leftover

I feel inclined to note that there are number of things in the big initial description that are at least misrepresented, but I think going over them individually would be counterproductive.

I think a lot of this really comes down to some understandable misconceptions about: what the point of having the RMG-RMS integration is, why we’ve been stuck in what I consider to be a worst of both worlds state and due to a new generation of tools how close we are to resolving these issues.

I will articulate:

  1. How we got here
  2. What the dream was
  3. Why we couldn’t achieve the dream at the time and why some things have gotten worse since then
  4. How to achieve the dream now and make our Julia problems vanish
  5. Logistics of how we get there

Why did we make RMS?

We present a number of good reasons to have RMS in the manuscript, but most of them were not why we built it in the first place. The first reason that’s less relevant here was to be able to use more modern numerical methods for sensitivity analysis such as adjoint and threaded sensitivities. The other reason that’s much more relevant here was to have a general simulation software that was speed competitive with existing software, but easy enough for students to add features relevant to their project without adding the feature needing to be a whole project in itself.

What was the original dream for the RMG-RMS interface?

What makes RMG unique from most other automatic mechanism generation algorithms is the use of species selection techniques, most notably the flux algorithm where we simulate the system and calculate fluxes to the edge allowing us to follow the flux to trace out a mechanism that accurately represents the chemistry in that system.

However, this has an inherent assumption that the system we’re running RMG on is the same system we’re ultimately simulating to get our results. …for taste in https://doi.org/10.1002/kin.21489 I simulated four different pyrolysis/combustion experiments each with distinct reaction systems only one of those four was reasonably represented by RMG’s simpleReactor. Even in catalysis where constant T is a bit more common ideally we would be simulating multiple facets instead of just one.

Enabling users to use reactors that better match their experiments of interest should significantly improve species selection, reduce model truncation errors and make writing RMG inputs files less art and more science. Which overall, should be a very significant improvement for RMG.

Achieving this and getting users on board isn’t reasonably feasible without the RMS reactors eventually being the default reactors.

Having RMS be RMG’s primary simulator also comes with additional benefits: -Faster simulations/species selection => faster RMG can generate bigger models -Can get rid of old reactors and dependencies we maintain -Unification of the reactor interface, all options are available for all reactors

How’d we end up in the state we are now?

  1. The Julia tools at the time were not up to the task

    • pyjulia isn’t very compatible with conda and the python-jl hack I used that gets around this was never intended to be a long-term fix…
    • Precompile time was still a major issue in the Julia community that didn’t have a great solution yet, but it was big enough that they were working on solving it
    • Reliable automatic installation of Julia was bad enough I was building my own Julia conda binaries
  2. I was graduating: To build this dream a number of things needed done: (1) Implement the core-edge algorithms in RMS (2) Manage install and calling RMS from RMG (3) Enable efficient launching of RMS core-edge from RMG (4) Implement set of new reactors (5) Teach people to use the new reactors and how customize them

However, it turns out (1) and (3) were a whole lot harder than I expected and I was graduating and leaving for my postdoc. I realized this was going to be my pdep/uncertainties module that was going to get passed down.

So a couple tricky decisions got made:

The dev team decided to merge the system for a few reasons:

  1. It was functional
  2. It was needed for future work in a couple different projects
  3. We had some good reasons to optimistic about the near future of pyjulia even if that optimism was ultimately misplaced
  4. Letting the feature die on the vine seemed like a huge waste

I decided to focus what time I had on cleaning up the aspects that I thought would be most challenging for anyone after me to deal with, which was (1) and (3). So (4) and (5) mostly didn’t happen and (2) that you guys are most familiar with was not great.

Since then I think the only other relevant change to note is the change from the Julia binaries I compiled myself to conda-forge Julia. This sounded like a good idea at the time, but this created two new problems: 1) conda-forge Julia is a large constraining dependency in environment/binary construction 2) binaries from places like conda-forge have become notorious enough that Julia actively warns against using them on its installation website. I strongly suspect the Zlib issues are associated with use of conda-forge Julia.

How do we achieve the dream now?

While at the original time I was one of the only people trying to interface Julia and python, it’s now quite popular and there are now a number of tools that weren’t available when I did this originally:

Juliaup: Julia official version manager/installer…this allows us to install juliaup (a minimal dependency) instead of conda-forge Julia (a complex dependency that is known to cause problems)

PackageCompiler.jl: Enables us to compile RMS into the julia system image which eliminates the RMS precompile time. Note that this system-image should be particular only to Julia version (not Julia install) and architecture so recompiling should be rare. Theoretically, we can distribute the system-image to particular architectures (removing nearly all of the installation procedure) as long as we are careful to keep our julia source and version constant.

juliacall/PythonCall.jl: python-Julia interface designed to work with a conda environment (unlike pyjulia)…removes the need to install diffeqpy and DifferentialEquations.jl…removes need for python-jl…actively developed…easier to use and better than pyjulia in pretty much every aspect except some syntax stuff…

Integration of RMG with Juliaup and juliacall/PythonCall.jl is already available in https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2640

I have successfully used PackageCompiler.jl to eliminate precompile time, although I haven’t tested system image transferability yet.

I compiled a list of Julia issues from most of you that I think can be summarized as: • Julia is a heavy dependency in the conda environment • Some issues such as missing libraries that seem to be associated with use of conda-forge julia • General concerns about RMS installation maintainability • Due to modifications of dependencies like in conda environments and occasionally the RMS environment won’t build (it’s a similar failure time window to conda environments, but it is an additional point of failure) • When RMS is loaded it precompiles taking a very appreciable amount of time that makes debugging much slower and is frustrating when running very short RMG runs

All of these can be resolved or at least mostly mitigated relatively using the tools listed above: • Julia heaviness in conda env => install Juliaup instead and installing Julia through it virtually eliminates this weight • Conda-forge Julia issues => Switch to Juliaup • General install maintainability concerns => switch to juliacall that is designed for our configuration unlike pyjulia • RMS environment challenges

Solutions are great, but how do we get there?

None of this precludes us from merging the part of https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2631 that avoids importing pyjulia when not necessary in the meantime.

JacksonBurns commented 2 months ago

An incomplete follow up on some offline discussion at the MIT RMG subgroup meeting - we will push on the Python upgrade and Julia dependency overhaul, which all parties seem interested in at some stage of development.