Cantera / cantera

Chemical kinetics, thermodynamics, and transport tool suite
https://cantera.org
Other
602 stars 346 forks source link

YamlWriter / yaml2ck issues with 'note' field parsed as integer #1610

Closed speth closed 2 months ago

speth commented 1 year ago

Problem description

There are a pair of coupled problems with species' note fields in cases where that field could be (but should not be) interpreted as an integer, for example 120186, which if you're psychic you might guess is the date December 1, 1986. Or maybe January 12, 1986. Or maybe something else entirely. In any case, Cantera should just preserve this as a string.

Steps to reproduce

In Python:

>>> import cantera as ct
>>> gas = ct.Solution('nDodecane_Reitz.yaml')
>>> gas.species('h2o2').input_data['note']
'120186'  # preserved as a string
>>> gas.write_yaml('test-note.yaml')

Looking at the resulting YAML file shows that this value has been converted to an integer:

  - name: h2o2
    composition: {H: 2.0, O: 2.0}
    ...
    note: 120186

which is inconsistent, but not necessarily a significant problem. However, trying to convert this to a Chemkin file:

gas2 = ct.Solution('test-note.yaml')
gas2.write_chemkin('test-note.ck', sort_species=None, sort_elements=None)

fails with the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [3], line 1
----> 1 gas2.write_chemkin('test-note.ck', sort_species=None, sort_elements=None)

File ~/src/cantera/build/python/cantera/solutionbase.pyx:345, in cantera.solutionbase._SolutionBase.write_chemkin()
    343 
    344         from cantera import yaml2ck
--> 345         output_paths = yaml2ck.convert(
    346             self,
    347             mechanism_path=mechanism_path,

File ~/src/cantera/build/python/cantera/yaml2ck.py:681, in convert(solution, phase_name, mechanism_path, thermo_path, transport_path, sort_elements, sort_species, overwrite)
    674 else:
    675     header_text = header_wrapper.fill("") + "\n"
    676 mechanism_text = [
    677     header_text,
    678     build_elements_text(
    679         all_elements,
    680     ),
--> 681     build_species_text(
    682         all_species,
    683     ),
    684 ]
    685 if thermo_path is None:
    686     mechanism_text.append(
    687         build_thermodynamics_text(
    688             all_species,
    689             separate_file=False,
    690         ),
    691     )

File ~/src/cantera/build/python/cantera/yaml2ck.py:177, in build_species_text(species, max_width)
    173 species_names = {s.name: s.input_data.get("note", "") for s in species}
    175 # Notes shorter than 7 characters are probably related to thermo entries
    176 # and will be included with the thermo entry.
--> 177 if any(len(s) > 6 for s in species_names.values()):
    178     max_species_len = max(len(s) for s in species_names.keys())
    179     max_species_len = max(5, max_species_len)

File ~/src/cantera/build/python/cantera/yaml2ck.py:177, in <genexpr>(.0)
    173 species_names = {s.name: s.input_data.get("note", "") for s in species}
    175 # Notes shorter than 7 characters are probably related to thermo entries
    176 # and will be included with the thermo entry.
--> 177 if any(len(s) > 6 for s in species_names.values()):
    178     max_species_len = max(len(s) for s in species_names.keys())
    179     max_species_len = max(5, max_species_len)

TypeError: object of type 'int' has no len()

System information

Additional context

Originally reported on the Cantera Users' Group: https://groups.google.com/g/cantera-users/c/gn_TJt1gspc

ischoegl commented 1 year ago

In any case, Cantera should just preserve this as a string.

I am not convinced of this conclusion. Per discussion when YAML was first introduced, there should not be anything canonical in terms of reserved field names (the field note was specifically discussed). From that perspective, it's hard to 'divine' a string out of an integer without knowing the context? I don't think that YamlWriter has enough information.

The alternative would be to convert integers to a string in yaml2ck when notes are written (or when an entry is first created via ck2yaml, although with ck2yaml being around for a while there may be many "pre-existing" YAML conditions). In this case, the context of conversion from/to Chemkin is clear. This would be an easy fix and would be my preference.

Just 2 cents of course ...

bryanwweber commented 1 year ago

From that perspective, it's hard to 'divine' a string out of an integer without knowing the context?

I believe the original NASA spec for this format calls for the field in these positions to be a string. In that sense, the int conversion is an artifact of the YAML parser we use/the YAML specification when the file is loaded in the first place. I think the fix is to write that field out as a string always (that is, with explicit quotes in the YAML file) and explicitly cast to a string in the yaml2ck side.

See also, putting quotes around major.minor semantic version numbers in YAML files, lest they be read as floats by a parser.

Edit: based on your edit, I believe we agree @ischoegl 😀

ischoegl commented 1 year ago

@bryanwweber ... happy to agree!

Fwiw, I had a really curious 'bug' in a YAML file once, where a git hash ended up being a valid integer (I guess I was pretty 'lucky'). So string round-trip conversion in YAML is not reliable - you can write out a string and get back an integer any time you don't use quotes. PS: this is exactly what happens in the example at the top.

ischoegl commented 1 year ago

I added a (partial/band-aid) fix to #1616, which takes care of the exception, but not the inadvertent type conversion. One thing that is now clear to me is is that YamlWriter does indeed know the correct type, but the serialization doesn't handle this edge case. I guess it would be possible to handle these using regex expressions, but I don't think that this is consequential enough.

ischoegl commented 1 year ago

Removing ‘bug’ label as the band-aid fix prevents failures.