Open speth opened 4 years ago
This is certainly interesting, and I’m not sure that I’m completely wrapping my head around this at the moment.
One question about this would be code maintenance, debugging and sustainability: abstraction layers make things harder to follow and could potentially hamper user contributions. At the moment, C++ and Python are pretty standard (I.e. accessible); the code snippets above are clearly not for the faint of heart. Code readability is imho really important, which will guarantee that the project can be maintained beyond the involvement of individual groups of developers. Just 2 cents of course ...
We may want to investigate how CoolProp accomplishes this task, since they have wrappers in every language imaginable, including Excel!
Not saying that it doesn’t make sense, just that there’s a price for flexibility ... trying this on a small portion is definitely a good idea.
PS: CoolProp appears to require a python installation for MATLAB, see their GitHub repo ...
I wanted to briefly ask whether there are any updates on this matter (@rwest mentioned code generation in #102)? Also, @bryanwweber had originally mentioned that CoolProp
's approach may be investigated to have wrappers "in every language imaginable". I'm not necessarily interested in the details, but it would be great to have an overview of current long-term thoughts.
It's mostly something we're bearing in mind as we work on the MATLAB interface. My sense of the general feeling is that code generation to reduce burden on developers will hopefully help us keep the interfaces up to date and prevent what got us here (feature drift over a decade as nobody wants to maintain an interface they don't use). But that it shouldn't be at the sacrifice of making the interface much worse (a crappy interface that is auto-generated is not the goal). So I think our first step is make the interface for a few methods/classes by hand, to figure out what it SHOULD look like, with auto-generation in mind, and then figure out if/how to auto-generate the rest.
For now @ssun30 is doing the work, and it's mostly being discussed on https://github.com/Cantera/enhancements/discussions/102
One thing that I came across recently (and finally spent some more time with) is include/cantera/cython/wrappers.h
. While this is de facto labeled cython
, much of the content doesn't appear to be tied to the Python interface specifically (with the exception of the PythonLogger
part). The way I understand it, this is a way to expose/share C++ arrays directly and could be used for a more generic cpplib
. Beyond, the #define
's used here provide for a pretty straight-forward vehicle to inject interface-specific code?
If I understand this correctly, would (could?) this be part of a broader solution or is it orthogonal to the current thinking?
One thing that I came across recently (and finally spent some more time with) is
include/cantera/cython/wrappers.h
. While this is de facto labeledcython
, much of the content doesn't appear to be tied to the Python interface specifically (with the exception of thePythonLogger
part). The way I understand it, this is a way to expose/share C++ arrays directly and could be used for a more genericcpplib
. Beyond, the#define
's used here provide for a pretty straight-forward vehicle to inject interface-specific code?If I understand this correctly, would (could?) this be part of a broader solution or is it orthogonal to the current thinking?
The functions that generated in wrappers.h
using C macros are there because I wanted to consolidate translating C/C++ arrays to Numpy arrays in as few functions as possible, e.g. the _getArray1
method. At least at the time, or at least as far as I could figure out, there was no way of passing a pointer to a C++ member function as an argument to a Cython function, so I needed a way of creating a "plain" function pointer to pass to the _getArray1
, which is what the ARRAY_FUNC
wrapper does.
I'm not sure how much utility it has outside its current use. It doesn't handle any of the other concerns with calling C++ code from a different language, like exception handling, and the argument types are still C++ objects like ThermoPhase*
.
I guess C preprocessor macros warrant some mention in the area of code generation, but if we're not just generating C/C++ code, I think it would be simplest to use a different, more modern tool for all of the generated code.
@speth … I am currently working on a proof-of-concept for an autogenerated CLib API that emerged from Cantera/cantera#1777 (PR is closed but branch is active). I will create a new PR once generated code can be tested.
@speth / @bryanwweber ... here's an update to my doxygen-based code generation proof-of-concept on ischoegl:sourcegen-doxygen-comments. Things look quite promising, but I'll probably put this to the side for a while.
I added a new CLibSourceGenerator
to sourcegen
that builds on the Cantera.tag
file (as well as the associated XML tree) and uses Jinja templates. At this point, I can produce CLib header files via:
% python interfaces/sourcegen/run.py clib clib
[INFO] Generating 'clib' source files...
[INFO] Parsing doxygen tags...
[INFO] writing 'ctfunc_auto.h'
[INFO] writing 'ctsoln_auto.h'
[INFO] Done.
Configuraton uses YAML markup that only uses minimal information for each line, e.g.
- name: newSolution
implements: newSolution(const string&, const string&, const string&)
what: constructor
- name: name
implements: Solution::name
which produces fully documented code:
/**
* Create and initialize a new Solution manager from an input file.
*
* @param infile name of the input file
* @param name name of the phase in the file. If this is blank, the first phase in the file is used.
* @param transport name of the transport model.
* @returns Handle to stored Solution object or -1 for exception handling.
*
* @implements newSolution(const string&, const string&, const string&, const vector<shared_ptr<Solution>>&)
*/
CANTERA_CAPI int soln_newSolution(const char* infile, const char* name, const char* transport);
/**
* Return the name of this Solution object.
*
* @param handle Handle to queried Solution object.
* @param[in] lenBuf Length of reserved array.
* @param[out] charBuf Returned string value.
* @returns Actual length of string or -1 for exception handling.
*
* @implements Solution::name()
*/
CANTERA_CAPI int soln_name(int handle, int lenBuf, char* charBuf);
Here is the full YAML input used for code generation
As well as the complete header file:
I honestly don't see any major obstacles for a full implementation (other than time spent).
That's beautiful, @ischoegl. I'm impressed by what you've been able to do with deriving a new docstring from the corresponding C++ method's docstring. I'd been anticipating that that would end up more customized for each target interface in the YAML config file, but it seems that you can handle a fair number of cases without needing any extra input.
That's beautiful, @ischoegl. I'm impressed by what you've been able to do with deriving a new docstring from the corresponding C++ method's docstring. I'd been anticipating that that would end up more customized for each target interface in the YAML config file, but it seems that you can handle a fair number of cases without needing any extra input.
Thanks, @speth! I was able to leverage some ideas from the experimental .NET API. I didn't implement every single case yet, but it seems relatively straight-forward. My code could probably benefit from a bit of cleanup, but the approach looks extremely workable. The Jinja portion was especially smooth.
Obviously, I'd expect the same YAML configuration to be sufficient for generating MATLAB, Fortran, etc. API's to be likewise fully documented. Beyond, it's just a matter of tweaking C++ documentation to optimize generated docs.
Abstract
The aim of this enhancement is to introduce a mechanism for generating the code needed to implement each of Cantera's language interfaces based on a language-agnostic description. This would make it easier to maintain the existing language interfaces, help keep their interfaces consistent and complete, and simplify the introduction of interfaces to additional languages.
Motivation
Currently, Cantera's interfaces to languages other than C++ are all implemented semi-independently. Any new feature to the C++ core which meant to be part of the public interface needs to have wrappers manually added to the Python, C, Matlab, and Fortran interfaces. In many cases, these wrappers are either not added, or are added only to a subset of the interfaces, with the Matlab and Fortran interfaces being the least complete. Adding a new language interface (e.g. for Julia) would require writing a huge number of wrapper functions. Large portions of this code follow very simple patterns, which mostly amount to translating how different languages handle arguments and data types like strings or arrays, as well as different naming conventions, which suggests that some of this repetitive code could be generated automatically.
Possible Solutions
Introduce a YAML file providing descriptions of each class and method that would be implemented as part of a Cantera interface. This file would provide information about the input and output types for the function, documentation for the function, etc, that could be used to construct the necessary wrapper code. For example:
This could then be used to fill in templates, using a standard Python templating library like Jinja. The easiest way to write the wrappers would probably be to do one for each distinct function signature, given how many functions in Cantera have the same signature, e.g. scalar getter/setter, array getter/setter, etc. For the above functions, the templates for the C wrappers might look like:
scalar getter:
_arraysetter:
This pseudo-code undoubtedly contains some errors, and I know there are many special cases that I haven't thought of yet in terms of what will need to be encoded in the YAML file, but I think this approach can be generalized to make maintaining a multitude of language interfaces for Cantera much easier.
One benefit of using this approach for the Matlab toolbox in particular is that it would eliminate most of the pain associated with the magic numbers which are used to specify the correct method to call within the single entry point of the
ctmethods
mex file, since they could be determined automatically.I think this approach would also be useful for implementing at least some portions of the Python interface.
A few issues in the descriptions that I know need some thought:
getMoleFractions
.Alternatives:
SWIG is general tool for generating wrapper interfaces for a number of languages. I considered it when overhauling the Python interface, and decided not to use it in part because it wasn't able to produce a particularly idiomatic interface, i.e. using things like "properties" in Python, or adopting naming conventions appropriate for the target language. It also does not have an interface for producing Matlab wrappers. Edit: SWIG also does not currently support Julia.