Update of ChemCheck (Cantera debugging tool)

12Chao commented 4 years ago

Abstract

ChemCheck is a web application for users to visualize the syntax error during conversion of chemkin files to yaml format (input file for cantera 2.5.0) and the chemical error in the model. The introduction has been posted on https://cantera.org/blog/GSoC_2019_Project_Introduction.html

Description

Except the work being done during two months in GSOC 2019(https://cantera.org/blog/GSoC_2019_Project_First_Evaluation.html, https://cantera.org/blog/GSoC_2019_Third_Blog.html,https://cantera.org/blog/GSoC_2019_Fourth_Blog.html), more works get done in past 2 months.

Test Suite
- Unit tests and the integration test for ChemCheck has been done. The integration test is achieved by travis CI. More tests will be added with the project developing.
Syntax Error diagnosis and visualization:
- After checking some models, the most common error we found is missing index number or index number not being aligned correctly at the end of line in the thermo file or the thermo block in a chemkin file. Cantera throws logging information:INFO:root:Error while reading thermo entry starting on line (line_number): and stops conversion when the “missing index number” error happens. In this case, ChemCheck will check the index number at the end of each line for the thermo data of the error species and make the suggestion to fix it. Here is an example from the model made by Sarathy:

Another example is from the model made by Wang: idx_out_of_position

For the lines starting with special or redundant characters which causes difficulty to cantera recognition, ChemCheck will show the position of the character and make the suggestion to delete them. An example from model 032-cheng:
If a model misses the transport data for a species, ChemCheck suggests users delete the transport file or delete the species from the mechanism file or manually add the transport data for the species which misses the transport data. Here is an example diagosing the model 111-Atef:
If a reaction has two type of parameter for one reaction, for example, PLOG parameters and non pressure dependent arrhenius parameters, ChemCheck will suggest to delete one set of them, for instance, 038-Labbe-Zhao:

duplicate_parameters

Errors like the indentation error in the first line of a species thermo data, Missing E in the the NASA polynomial parameters, and unexpected character in the middle of the thermo data, which causes the value error raised by cantera are hard to diagnose, so ChemCheck will suggest all possible reasons. This diagnosis is not very precise and could be improved in the future work. Here is the example 0325-Nawdiyal:

Chemical Error visualization:
- NASA Polynomial discontinuity: Some species in the model have NASA polynomial discontinuity, which means the values calculated from high temperature NASA polynomial parameters and low temperature NASA polynomial parameters at the mid of temperature range are not equal. The reason for this problem could be the wrong NASA polynomial parameter provided in the model or the mid temperature is chosen inappropriately. To visualize this, ChemCheck plots figures of the thermal properties of the error species with NASA polynomial discontinuity. ChemCheck is able to check for NASA 7 polynomials now, and the check for NASA 9 polynomials will be added shortly. Here is the example model in cantera ncm-2017-materials:
- Negative sum of kinetic coefficients and A factor for pressure dependent reactions and duplicate reactions: This problem has been discussed in cantera user’s group and cantera issues, The pressure-dependent arrhenius rate expressions in cantera are calculated by logarithmically interpolating between Arrhenius rate expressions at various pressures. To calculate the rate expression at a certain pressure P between P1, P2 which are given in the reaction rate data for this pressure dependent reaction, it will need log k1 under pressure P1, log k2 under pressure P2 to plug in an equation in terms of the rate expression at pressure P. Details are showing here. (https://cantera.org/science/reactions.html#pressure-dependent-arrhenius-rate-expressions-p-log) If there are more than one set of arrhenius parameter at P1 or P2, cantera will take the sum of reaction rate constants calculated from all sets of arrhenius parameter under this pressure, and take the logarithm of the sum of k. However, if the sum of k is negative, the logarithm of a negative number does not exist, so cantera throws a validation error. Similarly, the sum of k also needs to be positive for duplicate reactions. To diagnose this, ChemCheck goes through all the pressure dependent reactions and duplicate reactions, calculate the sum of reaction rate constants under the same pressure at temperature [200K, 500K, 1000K, 2000K, 10000K] respectively. If the result is negative, the equation and the wrong arrhenius parameters for that equation temperature, and pressure will be shown on the website. It will also check if the A factor is negative for the pressure with only one set of parameters. Here is the diagnosis example for cantera issue 77:\

negative_duplicate_sum_k

Here is another example from the Discussion in Cantera Google group screencapture-127-0-0-1-8000-mechanism-25-pdep-negativeA-2020-03-18-21_45_51 negative_duplicate_sum_k

Collision Violation Check: we are working on adding collision violation check for a kinetic model in yaml format. The collision limit calculation for bimolecular reaction is mentioned in ’Violation of collision limit in recently published reaction models’ , However, this methodology may not be appropriate to apply on falloff reactions and three body reactions, so we are trying to explore a methodology to calculate the collision limit for falloff and three body reactions. As we discussed before, the future check will be including the dead-end path way which is mentioned in Mechanism reduction for multicomponent surrogates: A case study using toluene reference fuels, and the CVODE errors explanation.

References

ChemCheck: https://github.com/comocheng/ChemCheck/tree/cx

ischoegl commented 4 years ago

FWIW, I am posting a link to another recently developed mechanism checker. Note that objectives are somewhat different, and I mainly wanted to add this as a reference.

bryanwweber commented 4 years ago

Hi @12Chao Thanks for the update! I have a couple of questions:

If a reaction has two type of parameter for one species, for example, PLOG parameters and non pressure dependent arrhenius parameters, ChemCheck will suggest to delete one set of them, for instance,

First, did you mean "reaction" instead of species? I also didn't understand what was wrong with the example reaction you showed.

Negative sum of kinetic constants and A factor for pressure dependent reactions and duplicate reactions:

This is only for PLOG reactions, I think, not for any other type of pressure dependent reaction or for regular duplicate reactions (please correct me if I'm wrong). Can you clarify?

P1, P2 which are given in the thermo data for this pressure dependent reaction

It is not really thermo data, it is reaction rate data 😄

12Chao commented 4 years ago

Hi Bryan, Thanks for the corrections and questions.

If a reaction has two type of parameter for one species, for example, PLOG parameters and non pressure dependent arrhenius parameters, ChemCheck will suggest to delete one set of them, for instance,

First, did you mean "reaction" instead of species? I also didn't understand what was wrong with the example reaction you showed.

Yes, it should be reaction. I think the bug here should be the "three-body reaction" has a set of PLOG parameters, and this confuses the ck2yaml.py when it tries to classify the reaction and rewrite them into yaml format. Thank you for pointing here that the suggestion is not clear enough, and I will figure out a better way to make suggestion to fix this.

Besides, I have another question about this problem, can a "three-body reaction" be pressure- dependent (except falloff reaction)? Since I haven't seen that kind of reaction, if it exists, does Cantera have a format to represent it?

Negative sum of kinetic constants and A factor for pressure dependent reactions and duplicate reactions:

This is only for PLOG reactions, I think, not for any other type of pressure dependent reaction or for regular duplicate reactions (please correct me if I'm wrong). Can you clarify?

The negative A factor check is only for the condition that a PLOG reaction only one set parameter for under a certain pressure, it does not cover the elementary reactions. The negative sum of kinetic coefficients check is for all the regular duplicate reactions, duplicate PLOG reactions, and PLOG reactions which have more than one set of arrhenius parameters under a certain pressure. For a PLOG reaction if the specified pressure has only one set of arrhenius parameters, negative A check will be applied; however, if the specified pressure has more than one set of arrhenius parameters, negative sum of k check will be applied. For a set of duplicate reactions, if they are not PLOG reactions, ChemCheck will find out th negative sum of k of all the arrhenius parameters for this series of duplicate reactions. If a set of duplicate reactions are PLOG reactions, all the arrhenius parameters and pressure for the duplicate reactions will be rearranged like the expression of a normal PLOG reaction and do the same check like a normal PLOG reaction.

P1, P2 which are given in the thermo data for this pressure dependent reaction

It is not really thermo data, it is reaction rate data 😄

12Chao commented 4 years ago

FWIW, I am posting a link to another recently developed mechanism checker. Note that objectives are somewhat different, and I mainly wanted to add this as a reference.

Hi Ingmar, Thanks for posting the reference here!

speth commented 4 years ago

I think having a graphical tool for being able to visualize issues like discontinuous thermo data and suspiciously high rate constants would be very useful. Since these issues could also arise for a mechanism which is created directly in the Cantera YAML format rather than from Chemkin input files, can ChemCheck also provide this analysis for YAML input files directly?

For the various syntax errors that can occur during conversion from Chemkin format, would it make more sense to work on improving the error messages provided by ck2yaml than developing this as a separate tool? If this were integrated with ck2yaml, you'd have access to more of the parser's internal state which might allow you to provide more helpful messages. I think it would also be more convenient for users to have this information available immediately when encountering an error rather than having to use a separate tool.

For a PLOG reaction if the specified pressure has only one set of arrhenius parameters, negative A check will be applied; however, if the specified pressure has more than one set of arrhenius parameters, negative sum of k check will be applied.

Cantera already does a check for this at a set of temperatures, since otherwise you will get NaNs for the total rate constant.

12Chao commented 4 years ago

Thanks for the suggestions!

I think having a graphical tool for being able to visualize issues like discontinuous thermo data and suspiciously high rate constants would be very useful. Since these issues could also arise for a mechanism which is created directly in the Cantera YAML format rather than from Chemkin input files, can ChemCheck also provide this analysis for YAML input files directly?

Yes, ChemCheck is doing the thermo discontinuous analysis for YAML file directly.

For the various syntax errors that can occur during conversion from Chemkin format, would it make more sense to work on improving the error messages provided by ck2yaml than developing this as a separate tool? If this were integrated with ck2yaml, you'd have access to more of the parser's internal state which might allow you to provide more helpful messages. I think it would also be more convenient for users to have this information available immediately when encountering an error rather than having to use a separate tool.

I agree with that. I think adding more error message and fix suggestions in 'ck2yaml' is a good idea, and it can be more precise. I will add this on the to do list.

For a PLOG reaction if the specified pressure has only one set of arrhenius parameters, negative A check will be applied; however, if the specified pressure has more than one set of arrhenius parameters, negative sum of k check will be applied.

Cantera already does a check for this at a set of temperatures, since otherwise you will get NaNs for the total rate constant.

I am not sure if there is an option for Cantera to throw all the pressure dependent reactions and duplicate reactions with negative sum of k for each temperature checkpoint(200k, 500k etc.). I think this can be useful if Cantera stops loading input file after encountering the first pdep or duplicate reaction with negative sum of k because ChemCheck provides all the problematic pdep and duplicate reactions for every temperature checkpoint. Especially in case of a model with many pdep or duplicate error reactions, this check helps avoid useres getting errors for loading the input file on Cantera again and again.

speth commented 4 years ago

I think this can be useful if Cantera stops loading input file after encountering the first pdep or duplicate reaction with negative sum of k because ChemCheck provides all the problematic pdep and duplicate reactions for every temperature checkpoint.

That's a good point. If an input file contains multiple problems, it would probably be easier if you could see them all at once, rather than fixing them one at a time. The way I'm thinking about it, though, is that the way Cantera handles this isn't set in stone -- we can change how errors are processed while loading input files to implement this behavior. It doesn't need to be a separate tool.

bryanwweber commented 4 years ago

I hope that this software can be more of a wrapper around existing and/or modified Cantera functionality. I think many people feel more comfortable with web-based interfaces, so I think an online converter is well worth the effort, even if all it does is display messages produced from ck2yaml or other parts of Cantera. Plus the possibility to integrate different visualization options is useful, as has been noted

ischoegl commented 4 years ago

It doesn't need to be a separate tool.

I hope it won't be a separate tool! One thing that the LLNL web-base checker allows for is fixing of discontinuous thermo data - it would be neat if this could be done automatically within ck2yaml (at least if a flag is provided, while pertinent warnings are displayed). Same thing for graphical output (i.e. generate graphs and/or report if an optional flag is provided).

A web portal would be certainly be useful for new users; others may prefer the command line where a detour to a web interface would be disruptive to the work flow.

bryanwweber commented 4 years ago

One thing that the LLNL web-base checker allows for is fixing of discontinuous thermo data - it would be neat if this could be done automatically within ck2yaml (at least if a flag is provided, while pertinent warnings are displayed).

In the past, we have avoided this automated change I think because it isn't easy to identify the appropriate change. Hopefully, if we provide an interface where people can explore and output the modified data, that will be good. If be curious to know how LLNL handles the automated changes though.

Same thing for graphical output (i.e. generate graphs and/or report if an optional flag is provided).

I think this would be a useful addition, but I don't think that ck2yaml is necessarily the right place, since the check is done in the C++ somewhere. Maybe a new mechanism validator script, to check thermo and reactions. I think that is one of the hopeful outcomes from this work.

A web portal would be certainly be useful for new users; others may prefer the command line where a detour to a web interface would be disruptive to the work flow.

Yes, the command line scripts are never going to go away 😊

rwest commented 4 years ago

A summary of the above discussion and a conversation with @12Chao, if I got this right:

Suggesting fixes for syntax errors in CK files
- should exist in the ck2yaml script, with nice error handling. Eg. put the suggestions as an attribute in the Error objects, and allow the script (from command line) or web interface to display them appropriately.
Thermo discontinuities
- these errors may also exist in yaml that didn't come from chemkin, so don't put the checkis in ck2yaml
- Cantera already checks this, when you try to create a Solution object
- but we can visualize this with plots, on web interface
- is a plot-generating command line tool also wanted?
Collision limit violations.
- these may also exist in yaml that didn't come from chemkin, so don't put the checks in ck2yaml
- command line tool could be helpful as well as web, but can be stand-alone tool not part of loading a cantera file.
- python module, can be run as script or imported into web UI ?
PLOG reactions with negative A factors sometimes lead to negative overall rate
- also applies to DUPLICATE Arrhenius reactions
- Cantera checks this at an array of temperatures, and dies at first instance of a problem
  - [ ] could be modified to list all problems, not crash on first problem?
  - [ ] could show which temperature ranges are valid and maybe let the model be used there?? (@cfgoldsmith has published rates like this that are valid for "combustion" but fail Cantera's checks)
- Should be part of web tool, and mechanism validator script/module
- Could use existing checks in Cantera and modify how they return results?
Diagnosing causes of ODE/DAE solver problems
- improving error handling from SUNDIALS
- has to be in cantera
- current progress: appending the component names to the error message
- goal: also suggest which reactions are causing the stiffness

One idea... I'm imagining something like this (pseudocode) so it's a python module:

def check_collision_limits(model):
    pass
def check_thermo_consistency(model):
    pass
if __name__ == "__main__":
    model = load_mechanism(sys.args[0])
    check_collision_limits(model)
    check_thermo_consistency(model)

that can be run on the command line python validate.py input.yaml but can also be used in the web UI like this:

from validate import check_collision_limits, check_thermo_consistency

so the actual checks only need to be written once but we have a command line UI and a web UI, the latter having file upload, editing, and pretty graphics. ?

bryanwweber commented 3 years ago

@12Chao What's the status of this work right now? Should this issue be updated and/or closed?

12Chao commented 3 years ago

We are currently working on integrating CSPlib to analyze the reactions with ultra fast time scales, but it has not gone very far yet. We have got a public IP address recently and will be deploying the website soon

whitesides1 commented 3 years ago

In the past, we have avoided this automated change I think because it isn't easy to identify the appropriate change. Hopefully, if we provide an interface where people can explore and output the modified data, that will be good. If be curious to know how LLNL handles the automated changes though.

We describe the methodology here [1], and the actual implementation is here [2].

[1] https://doi.org/10.1016/j.combustflame.2020.06.010 [2] https://github.com/LLNL/zero-rk/blob/master/applications/thermo_check/thermo_fix.cpp

speth commented 2 days ago

Reaction rates that substantially violate the collision limits can even cause the sensitivity analysis algorithm to give nonsensical results. See this Users' Group post for an example.

Cantera / enhancements

Update of ChemCheck (Cantera debugging tool) #42