Native YAML writer - Githubissues

rwest commented 6 months ago

Motivation or Problem

Cantera has moved away from reading .cti files and reads kinetic models in YAML files. The language is extensible and allows new features, such as blowers-masel rate expressions, coverage-dependent surface kinetics, advanced mixing rules, etc. RMS also reads YAML files, in a (I think and hope) fully compatible format. It also has extra features not in Chemkin or Cantera. We should output models in YAML format that are compatible with Cantera.
The YAML writer for RMS is apparently slow. See https://github.com/ReactionMechanismGenerator/RMG-Py/issues/2499 and https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2508 (there's some helpful comments in the discussion on both, about why the yaml library is inherently slow to write)
In https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2321 we (mostly Nora) made a cantera yaml writer, that mostly worked by creating in-memory Cantera objects, then requesting Cantera to turn them into dictionaries of input data, then getting the yaml library to write the dictionaries as yaml. This works - and should be merged because it has side-effect of fixing some to_cantera methods, etc. But it is probably much slower than needed.
When trying to compare https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2321 with the current main branch quickly yesterday, I got confused - as if a lot of things (rms writer) have disappeared entirely since I last looked. Maybe I looked too quickly, but maybe this is now more complicated (or simpler?) than I thought.

Desired Solution

Since we only have a few types of object, a totally general approach of using the yaml library, isn't needed. We could just template how each type of object should be rendered, as an f-string, and format to text quite quickly. Further optimizations might be possible by caching results (though you then have to figure out when to clear or update the cache, in case people do things outside of a typical RMG run like modifying objects on the fly). Maybe a good approach is to rebase and merge https://github.com/ReactionMechanismGenerator/RMG-Py/pull/2321 then look at speeding it up?

Potential Alternatives

Stick with the current RMS approach, but tweak it slightly to be cantera-friendly?

Additional Context

New student @ntietje1 is willing to put some time into this now. Any thoughts to get him started (or before he gets started doing the wrong thing) would be appreciated. We could have a zoom call if that's more efficient than posting - but there's something to be said for having discussion here on github, because it persists. I expect @JacksonBurns and @mjohnson541 may have helpful input, but likely other people too. Thanks

JacksonBurns commented 6 months ago

Agreed! We (the Green group) have also been planning to support a YAML input file format as well - I think having YAML on both ends just makes sense.

I also agree with the through-comments about writing a simpler YAML file dumper - something like (pseudocode):

for reactor in my_simulation:
   write_to_yaml_file(f"""
 - reactor_name: {reactor.name}
   - type: {reactor.type}
   - origin: RMG
"""

would be (relatively) straightforward to add and dramatically faster than the YAML library.

I agree with your proposed path forward: finalizing and merging #2321 and then writing a new module in RMG that dumps YAML files. We would then replace lines like this with a call to "rmg_yaml" or whatever we call it, which contains a better version of this pseudocode.

We could then re-use that yaml dumper elsewhere in RMG for writing to RMS (as mentioned) as well as dumping as our own mechanisms.

rwest commented 6 months ago

We've rebased #2321 and are trying to build on it.

Maybe we should just get it totally debugged and working and merged first? ( and do the "build on it" in a separate branch).

Do you think the "writing yaml" code on an object (a species, for example) should live in the class definition (eg. in Species), or in a yaml_writer file (like chemkin.pyx currently for writing chemkin).

JacksonBurns commented 6 months ago

Agreed with merging the mostly-done writer now and then building on it later. In my head this is like avoiding premature optimization - make sure we have something that does what we want, and then abstract away the internals to make something we can re-use elsewhere.

I actually quite like the idea of having each class define it's own YAML string. It would align well with how we define pickle-related method.

Each class would implement a function like "Species.get_chemkin_yaml" which would return a string like the one I left in the comment above. The yaml writer module would then just be responsible for retrieving all of these strings in the correct order and indenting/formatting them where appropriate. This seems like an excellent idea.

github-actions[bot] commented 3 months ago

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.

ReactionMechanismGenerator / RMG-Py

Native YAML writer #2633

Motivation or Problem

Desired Solution

Potential Alternatives

Additional Context