Cantera / cantera

Chemical kinetics, thermodynamics, and transport tool suite
https://cantera.org
Other
580 stars 341 forks source link

Multi line notes in Chemkin with mixed white space after comment symbols cause issues with write_yaml #1695

Open corykinney opened 1 month ago

corykinney commented 1 month ago

Problem description

Sorry for the title gore, but recently I've been working with the NUIGMech 1.2 and have been having issues performing mechanism reductions and writing the reduced mechanisms to YAML files. The issue seems to stem from the Chemkin source files having comments that have a mixture of whitespace after the exclamation point within the same comment. I'd be happy to help contribute to this, but would need advice on the recommended way to fix it that wouldn't break other behaviors.

Steps to reproduce

Run ck2yaml with a Chemkin file with the following reaction, note that the original Chemkin file has a space after the ! in the first line of the comment and no space in the second line of the comment:

! A.W. Jasper, S.J. Klippenstein, L.B. Harding, J. Phys. Chem. A 111 (2007) 8699-8707.

!DONG 0406
CH2+CH2=>C2H2+H+H                     7.0E13   0.0022      8 !  

Produces the following output, which Cantera loads correctly:

- equation: CH2 + CH2 => C2H2 + H + H  # Reaction 90
  rate-constant: {A: 7.0e+13, b: 2.2e-03, Ea: 8.0}
  note: |2-
     A.W. Jasper, S.J. Klippenstein, L.B. Harding, J. Phys. Chem. A 111 (2007) 8699-8707.
    DONG 0406

The |2- keeps the indentation consistent with the first line keeping its extra space when loaded. After loading the mechanism, creating a new ct.Solution object that has that species in it and writing it using ck2yaml gives the following output:

  - equation: 2 CH2 => C2H2 + 2 H
    rate-constant: {A: 7.0e+10, b: 2.2e-03, Ea: 3.3472e+04}
    note: |
       A.W. Jasper, S.J. Klippenstein, L.B. Harding, J. Phys. Chem. A 111 (2007) 8699-8707.
      DONG 0406

When it gets written, the 2- is now gone but the initial space gets written out, so this mechanism can't be loaded after the round trip through ct.Solution and then write_yaml as either a illegal map value or end of map not found error is thrown. If everything else is behaving correctly, maybe write_yaml just needs some handling for leading whitespace on multi line notes that the ck2yaml already has? Otherwise, I don't know what the expected behavior is, so I'd appreciate advice on how to go about fixing it. Manually, I've been fixing it by adding the 2- back to reactions that throw errors, but that is a tedious process to do reaction by reaction for such large mechanisms.

As a side note, I wish all mechanisms were developed in Cantera in the first place 😜

System information

bryanwweber commented 1 month ago

Thanks @corykinney! If I had to guess, this line in ck2yaml ensures the output has the correct indentation marker. The YAML writer from Solution looks like its done in C++ though, and I haven't dug into that at all to know where a fix might come in.

bryanwweber commented 1 month ago

It looks like this is maybe something that yaml-cpp doesn't support yet? https://github.com/jbeder/yaml-cpp/issues/1191

corykinney commented 1 month ago

It looks like this is maybe something that yaml-cpp doesn't support yet? jbeder/yaml-cpp#1191

If that notation can't be added automatically for the write_yaml function with the current yaml writer, is there any harm in just removing leading whitespace in multi-line notes? Or if it always writes yaml files with indentation of 2 spaces, is there a naive way to force the |2- on every multi-line note, even when not necessary? I don't know the file formats super well, just throwing out ideas.

bryanwweber commented 1 month ago

is there any harm in just removing leading whitespace in multi-line notes?

The only harm I can think of is that you won't be able to round-trip a chemkin file (meaning ck->yaml->ck) and get identical output. I'm not sure how useful that is, and this seems like the easiest way to fix this particular problem, and having a working YAML file seems more important.

ischoegl commented 1 month ago

I'm :+1: for removing the leading whitespace. Preserving this leading whitespace appears to create issues while doing nothing in terms of problem solving. In other words, it's a 'sacrifice' that is inconsequential unless any of us can think of a scenario where 'formatting' chemkin comments is warranted?

speth commented 1 month ago

I'm unclear on whether this is a bug in yaml-cpp, or just how we're using it. Regardless of whether it supports certain options for how to format strings, I wouldn't think that yaml-cpp should output YAML that it can't then parse.

ischoegl commented 2 days ago

Here's a Cantera MWE with the actual error:

In [1]: import cantera as ct
   ...: gas = ct.Solution("gri30.yaml")

In [2]: yaml = """
   ...: equation: 2 CH2 => C2H2 + 2 H
   ...: rate-constant: {A: 7.0e+10, b: 2.2e-03, Ea: 3.3472e+04}
   ...: note: |
   ...:    A.W. Jasper, S.J. Klippenstein, L.B. Harding, J. Phys. Chem. A 111 (2007) 8699-8707.
   ...:   DONG 0406
   ...: foo: bar  # adding a dummy line
   ...: """

In [3]: ct.Reaction.from_yaml(yaml, gas)
[...]
CanteraError:
*******************************************************************************
InputFileError thrown by AnyMap::fromYamlString:
Error on line 6 of input string:
end of map not found
|  Line |
|     1 |
|     2 | equation: 2 CH2 => C2H2 + 2 H
|     3 | rate-constant: {A: 7.0e+10, b: 2.2e-03, Ea: 3.3472e+04}
|     4 | note: |
|     5 |    A.W. Jasper, S.J. Klippenstein, L.B. Harding, J. Phys. Chem. A 111 (2007) 8699-8707.
>     6 >   DONG 0406
            ^
|     7 | foo: bar  # adding a dummy line
*******************************************************************************

with the error being triggered here: https://github.com/Cantera/cantera/blob/bc24169dad7dde4db1fdb6c53f4a540129852e1e/src/base/AnyMap.cpp#L1758

It looks like yaml-cpp is indeed writing YAML that can't be read ... see WriteLiteralString method (which does not take additional arguments)

bryanwweber commented 2 days ago

I just came across https://github.com/biojppm/rapidyaml, throwing it out there as a potential alternative if we want to go that route. I have no idea if it fixes this bug.

ischoegl commented 2 days ago

Performance of that library looks really impressive. At the same time, it’s a rather drastic approach to address this issue 🤣