Use code generation to provide interfaces for C, Fortran, Matlab, Julia, etc.

speth commented 4 years ago

Abstract

The aim of this enhancement is to introduce a mechanism for generating the code needed to implement each of Cantera's language interfaces based on a language-agnostic description. This would make it easier to maintain the existing language interfaces, help keep their interfaces consistent and complete, and simplify the introduction of interfaces to additional languages.

Motivation

Currently, Cantera's interfaces to languages other than C++ are all implemented semi-independently. Any new feature to the C++ core which meant to be part of the public interface needs to have wrappers manually added to the Python, C, Matlab, and Fortran interfaces. In many cases, these wrappers are either not added, or are added only to a subset of the interfaces, with the Matlab and Fortran interfaces being the least complete. Adding a new language interface (e.g. for Julia) would require writing a huge number of wrapper functions. Large portions of this code follow very simple patterns, which mostly amount to translating how different languages handle arguments and data types like strings or arrays, as well as different naming conventions, which suggests that some of this repetitive code could be generated automatically.

Possible Solutions

Introduce a YAML file providing descriptions of each class and method that would be implemented as part of a Cantera interface. This file would provide information about the input and output types for the function, documentation for the function, etc, that could be used to construct the necessary wrapper code. For example:

class:
  name: ThermoPhase
  prefix: thermo
  cabinet: ThermoCabinet
  methods:
  - name: nSpecies
    arguments: []
    returns: size_t
  - name: setMoleFractions
    arguments:
    - {type: double, dimensions: [nSpecies]}
    returns: void

This could then be used to fill in templates, using a standard Python templating library like Jinja. The easiest way to write the wrappers would probably be to do one for each distinct function signature, given how many functions in Cantera have the same signature, e.g. scalar getter/setter, array getter/setter, etc. For the above functions, the templates for the C wrappers might look like:

scalar getter:

// size_t thermo_nSpecies(int n) {
{{ method.returns }} {{ class.prefix }}_{{ method.name }} (int n) {
    try {
        // return ThermoCabinet::item(n).nSpecies();
        return {{ class.cabinet }}::item(n).{{ method.name }}();
    } catch (...) {
        return handleAllExceptions(npos, npos);
    }
}

_arraysetter:

// int thermo_setMoleFractions(int n, double* arg0, size_t len_arg0) {
int {{ class.prefix }}_{{ method.name }} (
    int n,
    {{ method.arguments[0].type }}* arg0,
    size_t len_arg0) {
    try {
        // auto& p = ThermoCabinet::item(n);
        auto& p = {{ class.cabinet. }}::item(n);
        // checkArraySize(len_arg0, p.nSpecies());
        checkArraySize(len_arg0, p.{{ arguments[0].dimensions[0]}}());
        // p.setMoleFractions(arg0);
        p.{{ method.name }}(arg0);
        return 0;
    } catch (...) {
        return handleAllExceptions(-1, ERR);
    }
}

This pseudo-code undoubtedly contains some errors, and I know there are many special cases that I haven't thought of yet in terms of what will need to be encoded in the YAML file, but I think this approach can be generalized to make maintaining a multitude of language interfaces for Cantera much easier.

One benefit of using this approach for the Matlab toolbox in particular is that it would eliminate most of the pain associated with the magic numbers which are used to specify the correct method to call within the single entry point of the ctmethods mex file, since they could be determined automatically.

I think this approach would also be useful for implementing at least some portions of the Python interface.

A few issues in the descriptions that I know need some thought:

naming conventions between different languages
handling of values that are logically return values, but appear as arguments in C++, i.e. getMoleFractions.
Transitioning from the existing interfaces to interfaces generated this way, given that there will probably be a few API changes

Alternatives:

SWIG is general tool for generating wrapper interfaces for a number of languages. I considered it when overhauling the Python interface, and decided not to use it in part because it wasn't able to produce a particularly idiomatic interface, i.e. using things like "properties" in Python, or adopting naming conventions appropriate for the target language. It also does not have an interface for producing Matlab wrappers. Edit: SWIG also does not currently support Julia.

ischoegl commented 4 years ago

This is certainly interesting, and I’m not sure that I’m completely wrapping my head around this at the moment.

One question about this would be code maintenance, debugging and sustainability: abstraction layers make things harder to follow and could potentially hamper user contributions. At the moment, C++ and Python are pretty standard (I.e. accessible); the code snippets above are clearly not for the faint of heart. Code readability is imho really important, which will guarantee that the project can be maintained beyond the involvement of individual groups of developers. Just 2 cents of course ...

bryanwweber commented 4 years ago

We may want to investigate how CoolProp accomplishes this task, since they have wrappers in every language imaginable, including Excel!

ischoegl commented 4 years ago

Not saying that it doesn’t make sense, just that there’s a price for flexibility ... trying this on a small portion is definitely a good idea.

PS: CoolProp appears to require a python installation for MATLAB, see their GitHub repo ...

jiweiqi commented 4 years ago

SWIG seems not supporting Julia. For Julia, maybe Clang.jl might be a good solution. It has been used to generate a wrapper for Sundials. Is anyone working on this approach?

ischoegl commented 3 years ago

I wanted to briefly ask whether there are any updates on this matter (@rwest mentioned code generation in #102)? Also, @bryanwweber had originally mentioned that CoolProp's approach may be investigated to have wrappers "in every language imaginable". I'm not necessarily interested in the details, but it would be great to have an overview of current long-term thoughts.

rwest commented 3 years ago

It's mostly something we're bearing in mind as we work on the MATLAB interface. My sense of the general feeling is that code generation to reduce burden on developers will hopefully help us keep the interfaces up to date and prevent what got us here (feature drift over a decade as nobody wants to maintain an interface they don't use). But that it shouldn't be at the sacrifice of making the interface much worse (a crappy interface that is auto-generated is not the goal). So I think our first step is make the interface for a few methods/classes by hand, to figure out what it SHOULD look like, with auto-generation in mind, and then figure out if/how to auto-generate the rest.

For now @ssun30 is doing the work, and it's mostly being discussed on https://github.com/Cantera/enhancements/discussions/102

ischoegl commented 3 years ago

One thing that I came across recently (and finally spent some more time with) is include/cantera/cython/wrappers.h. While this is de facto labeled cython, much of the content doesn't appear to be tied to the Python interface specifically (with the exception of the PythonLogger part). The way I understand it, this is a way to expose/share C++ arrays directly and could be used for a more generic cpplib. Beyond, the #define's used here provide for a pretty straight-forward vehicle to inject interface-specific code?

If I understand this correctly, would (could?) this be part of a broader solution or is it orthogonal to the current thinking?

speth commented 3 years ago

One thing that I came across recently (and finally spent some more time with) is include/cantera/cython/wrappers.h. While this is de facto labeled cython, much of the content doesn't appear to be tied to the Python interface specifically (with the exception of the PythonLogger part). The way I understand it, this is a way to expose/share C++ arrays directly and could be used for a more generic cpplib. Beyond, the #define's used here provide for a pretty straight-forward vehicle to inject interface-specific code?

If I understand this correctly, would (could?) this be part of a broader solution or is it orthogonal to the current thinking?

The functions that generated in wrappers.h using C macros are there because I wanted to consolidate translating C/C++ arrays to Numpy arrays in as few functions as possible, e.g. the _getArray1 method. At least at the time, or at least as far as I could figure out, there was no way of passing a pointer to a C++ member function as an argument to a Cython function, so I needed a way of creating a "plain" function pointer to pass to the _getArray1, which is what the ARRAY_FUNC wrapper does.

I'm not sure how much utility it has outside its current use. It doesn't handle any of the other concerns with calling C++ code from a different language, like exception handling, and the argument types are still C++ objects like ThermoPhase*.

I guess C preprocessor macros warrant some mention in the area of code generation, but if we're not just generating C/C++ code, I think it would be simplest to use a different, more modern tool for all of the generated code.

ischoegl commented 1 month ago

@speth … I am currently working on a proof-of-concept for an autogenerated CLib API that emerged from Cantera/cantera#1777 (PR is closed but branch is active). I will create a new PR once generated code can be tested.

ischoegl commented 4 weeks ago

@speth / @bryanwweber ... here's an update to my doxygen-based code generation proof-of-concept on ischoegl:sourcegen-doxygen-comments. Things look quite promising, but I'll probably put this to the side for a while.

I added a new CLibSourceGenerator to sourcegen that builds on the Cantera.tag file (as well as the associated XML tree) and uses Jinja templates. At this point, I can produce CLib header files via:

% python interfaces/sourcegen/run.py clib clib
[INFO] Generating 'clib' source files...
[INFO] Parsing doxygen tags...
[INFO]   writing 'ctfunc_auto.h'
[INFO]   writing 'ctsoln_auto.h'
[INFO] Done.

Configuraton uses YAML markup that only uses minimal information for each line, e.g.

  - name: newSolution
    implements: newSolution(const string&, const string&, const string&)
    what: constructor
  - name: name
    implements: Solution::name

which produces fully documented code:

    /**
     * Create and initialize a new Solution manager from an input file.
     * 
     * @param infile        name of the input file
     * @param name          name of the phase in the file. If this is blank, the first phase in the file is used.
     * @param transport     name of the transport model.
     * @returns             Handle to stored Solution object or -1 for exception handling.
     * 
     * @implements newSolution(const string&, const string&, const string&, const vector<shared_ptr<Solution>>&)
     */
    CANTERA_CAPI int soln_newSolution(const char* infile, const char* name, const char* transport);

    /**
     * Return the name of this Solution object.
     * 
     * @param handle        Handle to queried Solution object.
     * @param[in] lenBuf    Length of reserved array.
     * @param[out] charBuf  Returned string value.
     * @returns             Actual length of string or -1 for exception handling.
     * 
     * @implements Solution::name()
     */
    CANTERA_CAPI int soln_name(int handle, int lenBuf, char* charBuf);

Here is the full YAML input used for code generation

soln_auto.yaml (click to expand)

_Below is the current version of YAML configuration (the field `name` could potentially be replaced by a default to reduce most entries to a single line)_ ```yaml # Configuration for code generation. # Implements portion of replacement for CLib "ct" # This file is part of Cantera. See License.txt in the top-level directory or # at https://cantera.org/license.txt for license and copyright information. cabinet: prefix: soln base: Solution parents: [] # List of parent classes derived: [Interface] # List of specializations uses: [ThermoPhase, Kinetics, Transport] # List of referenced cabinets functions: - name: newSolution implements: newSolution(const string&, const string&, const string&) what: constructor - name: newInterface # currently disabled in CLib's config.yaml implements: newInterface(const string&, const string&, const vector>&) what: constructor - name: del what: destructor relates: # methods used to retrieve instances of managed objects - "Solution::thermo" - "Solution::kinetics" - "Solution::transport" - name: name implements: Solution::name - name: setName implements: Solution::setName - name: thermo implements: Solution::thermo - name: kinetics implements: Solution::kinetics - name: transport implements: Solution::transport - name: setTransport implements: Solution::setTransport - name: soln_nAdjacent implements: Solution::nAdjacent - name: adjacent implements: Solution::adjacent(size_t) ```

As well as the complete header file:

soln_auto.h (click to expand)

```c++ /** * @file ctsoln_auto.h * * @warning This module is an experimental part of the %Cantera API and * may be changed or removed without notice. */ // This file is part of Cantera. See License.txt in the top-level directory or // at https://cantera.org/license.txt for license and copyright information. #ifndef __CTSOLN_AUTO_H__ #define __CTSOLN_AUTO_H__ #include "clib_defs.h" #ifdef __cplusplus extern "C" { #endif /** * Create and initialize a new Solution manager from an input file. * * @param infile name of the input file * @param name name of the phase in the file. If this is blank, the first phase in the file is used. * @param transport name of the transport model. * @returns Handle to stored Solution object or -1 for exception handling. * * @implements newSolution(const string&, const string&, const string&, const vector>&) */ CANTERA_CAPI int soln_newSolution(const char* infile, const char* name, const char* transport); /** * Delete Solution object. * * @param handle Handle to Solution object. * @returns Zero for success and -1 for exception handling. * * @relates Solution::thermo, Solution::kinetics, Solution::transport */ CANTERA_CAPI int soln_del(int handle); /** * Return the name of this Solution object. * * @param handle Handle to queried Solution object. * @param[in] lenBuf Length of reserved array. * @param[out] charBuf Returned string value. * @returns Actual length of string or -1 for exception handling. * * @implements Solution::name() */ CANTERA_CAPI int soln_name(int handle, int lenBuf, char* charBuf); /** * Set the name of this Solution object. * * @param handle Handle to queried Solution object. * @param name Undocumented. * * @implements Solution::setName(const string&) */ CANTERA_CAPI int soln_setName(int handle, const char* name); /** * Accessor for the ThermoPhase pointer. * * @param handle Handle to queried Solution object. * @returns Handle to stored ThermoPhase object or -1 for exception handling. * * @implements Solution::thermo() */ CANTERA_CAPI int soln_thermo(int handle); /** * Accessor for the Kinetics pointer. * * @param handle Handle to queried Solution object. * @returns Handle to stored Kinetics object or -1 for exception handling. * * @implements Solution::kinetics() */ CANTERA_CAPI int soln_kinetics(int handle); /** * Accessor for the Transport pointer. * * @param handle Handle to queried Solution object. * @returns Handle to stored Transport object or -1 for exception handling. * * @implements Solution::transport() */ CANTERA_CAPI int soln_transport(int handle); /** * Set the Transport object directly. * * @param handle Handle to queried Solution object. * @param transport Undocumented. * * @implements Solution::setTransport(shared_ptr) */ CANTERA_CAPI int soln_setTransport(int handle, int transport); /** * Get the number of adjacent phases. * * @param handle Handle to queried Solution object. * * @implements Solution::nAdjacent() */ CANTERA_CAPI int soln_soln_nAdjacent(int handle); /** * Get the Solution object for an adjacent phase by index. * * @param handle Handle to queried Solution object. * @param i Undocumented. * @returns Handle to stored Solution object or -1 for exception handling. * * @implements Solution::adjacent(size_t) */ CANTERA_CAPI int soln_adjacent(int handle, int i); #ifdef __cplusplus } #endif #endif // __CTSOLN_AUTO_H__ ```

I honestly don't see any major obstacles for a full implementation (other than time spent).

speth commented 4 weeks ago

That's beautiful, @ischoegl. I'm impressed by what you've been able to do with deriving a new docstring from the corresponding C++ method's docstring. I'd been anticipating that that would end up more customized for each target interface in the YAML config file, but it seems that you can handle a fair number of cases without needing any extra input.

ischoegl commented 4 weeks ago

That's beautiful, @ischoegl. I'm impressed by what you've been able to do with deriving a new docstring from the corresponding C++ method's docstring. I'd been anticipating that that would end up more customized for each target interface in the YAML config file, but it seems that you can handle a fair number of cases without needing any extra input.

Thanks, @speth! I was able to leverage some ideas from the experimental .NET API. I didn't implement every single case yet, but it seems relatively straight-forward. My code could probably benefit from a bit of cleanup, but the approach looks extremely workable. The Jinja portion was especially smooth.

Obviously, I'd expect the same YAML configuration to be sufficient for generating MATLAB, Fortran, etc. API's to be likewise fully documented. Beyond, it's just a matter of tweaking C++ documentation to optimize generated docs.

Cantera / enhancements

Use code generation to provide interfaces for C, Fortran, Matlab, Julia, etc. #39