KatanaGraph / katana

Other
99 stars 65 forks source link

Research upsides and downsides of different wrapping techniques for Python #179

Open arthurp opened 3 years ago

arthurp commented 3 years ago

Tools to look at:

JIRA: https://katanagraph.atlassian.net/browse/ENG-322

insertinterestingnamehere commented 3 years ago

Just to add on here, IMHO, the ideal for many numerical applications would revolve around a workflow that goes something like this: library -> wrappers using standardized (WRT a MOP) data structures that are available in each source language -> auto-generated wrapper exposed in any target language via a MOP. None of the existing solutions really do this though since no sufficiently general MOP for regular/irregular data in computationa/data science.

insertinterestingnamehere commented 3 years ago

I think the XTensor people have a somewhat similar idea going: they bind stuff to their C++ data structures and then have facilities for quickly adapting those things to be usable in Python/Julia/R/etc. The main idea is just that autogenerating can work great if there's a unifying data layout/style with corresponding library data structures. Translating the idioms of a wrapped library to the desired library data structures and semantics can take place in whatever source language the wrapped library was made for, then the export to other languages can then be mostly automated.

arthurp commented 3 years ago

I have been thinking about this on and off for a couple of months now and I have a specific plan at this point. I plan to test it before making a final decisions, but I think it will work well.

Binding from C++ into Python is done with Pybind11. This supports a lot (see https://pybind11.readthedocs.io/en/stable/classes.html#), however, there will inevitably be API quality issues in the "raw" Pybind11 API exposed in Python that cannot easily be fixed from the C++ side (for instance, working around issues of unique_ptr arguments). Also, to integrate with Numba, we need to have some real Python code since the functions need to have Python bytecode for Numba to compile.

Many "raw" C++ bindings provided by Pybind11 will need another layer of wrapping at the Python level to provide any features that can only be effectively provided at the Python level. Hopefully this can be mostly automated with metaprogramming, but some custom wrapping may be needed for some classes. Especially to handle Python libraries types we want to interoperate with like pandas.