RosettaCommons / binder

Binder, tool for automatic generation of Python bindings
MIT License
321 stars 66 forks source link

Multiple questions associated to usage of binder with a large heavily templated project #214

Closed kliegeois closed 2 years ago

kliegeois commented 2 years ago

First of all, thanks for developing this tool; I have been able to successfully apply it in the past week on some projects.

I have couple of questions associated to its application on the project Trilinos ( https://github.com/trilinos/trilinos ), a large C++ project which relies heavily on some template parameters.

Quick context on project:

The installation of the library is as follows:

The goal to use binder is to have an automatic way of creating and maintaining a python interface for the project without requiring the developers of the different packages to maintain the interface themself.

I have the following questions:

  1. In the config.cfg file it is possible disable one class or one function one by one; is it possible to disable every classes and functions and then enable some of them one at a time? If so, how can I do that?

  2. For our templated functions/classes, we use some ETI strategies to build and link the library. Typically, we rely on CMake to generate some .cpp files in the build directory which are then compiled and linked to the .so lib. How can binder be aware of those ETI? How can we instantiate multiple times the same class with different template values and generate different Python variable names?

  3. Currently, we use macros for the ETI such as:

    #  define TPETRA_MULTIVECTOR_INSTANT(SCALAR,LO,GO,NODE) \
    template class MultiVector< SCALAR , LO , GO , NODE >; \
    template MultiVector< SCALAR , LO , GO , NODE > createCopy( const MultiVector< SCALAR , LO , GO , NODE >& src);

    that we use as follows in a cpp file generated by CMake:

    TPETRA_DETAILS_MAKECOLMAP_INSTANT( int, longlong, Kokkos_Compat_KokkosSerialWrapperNode )

    Can binder work around the macros in this context and understand the fact that the .cpp files do the ETI?

  4. In the source codes, we have some classes, functions, and includes that are guarded by some preprocessing checks. For example, if some TPL is included some classes can be declared and defined, otherwise it is not the case. When calling binder, is it possible to process those #ifdef checks and drop the content of the .hpp file between the begin and the end of the check if not defined?

  5. How can we make the binding process iterative by enabling one package at a time using binder instead of using all the files at once and still have some dependencies between packages?

  6. In Trilinos, we use a class of the project as an implementation of the shared pointer. I successfully used this class using PYBIND11_DECLARE_HOLDER_TYPE with PyBind11 on a smaller project. Is there a way to specify an extract shared pointer class to binder?

Thanks a lot for your time and help!

lyskov commented 2 years ago
  1. In the config.cfg file it is possible disable one class or one function one by one; is it possible to disable every classes and functions and then enable some of them one at a time? If so, how can I do that?

-- yes, this should be possible. If such fine-grained control is required then i guess best approach would be to explicitly disable namespace bindings for target namespaces (say -namespace A) and then enable bindings for specific classes/functions in target namespaces with +class A::MyClass or +function A::my_function. (Depending on how much control you need it might worth investigate both approaches: [a] explicitly disabling bindings for project namesaces and [b] not-requesting bindings for project namespaces). Please see detail info on options here: https://cppbinder.readthedocs.io/en/latest/config.html#config-file-options

  1. For our templated functions/classes, we use some ETI strategies to build and link the library. Typically, we rely on CMake to generate some .cpp files in the build directory which are then compiled and linked to the .so lib. How can binder be aware of those ETI?

-- Binder will only be able bind templates that have been fully instantiated (otherwise LLVM it simply will be lacking info to generate bindings). Please note that usually explicit instantiation of template (say template class MyClass<int>;) will not be enough and instead something like inline-function-that-will-take-your-class-by-value will be needed (inline F(MyClass<int>) {};). Such explicit instantiation will need to be present in all_include.hpp file(s) that Binder takes as it main argument. Depending on the structure of your generated .cpp files it might worth trying to simply include generated .cpp files directly from your project all_include.hpp file. For my project what i do is the following: when such special template instantiation is needed i added them to main project include files with #ifdef MyPythonBindings guards so it does not pollute main .cpp builds. As an idea: since you already generate .cpp files with template instantiation then maybe it will also be possible for CMake to generate set of .hpp files with inline function definitions outlined above and include such .hpp files in Binder input?

lyskov commented 2 years ago

How can we instantiate multiple times the same class with different template values and generate different Python variable names?

-- when you instantiate C++ template class with particular set of template parameters Binder will generate unique name for such class. For example std::binary_function<float,float,bool> will have name like binary_function_float_float_bool_t while std::binary_function<string,float,bool> will be bound as binary_function_string_float_bool_t, - does this answer your question?

  1. Currently, we use macros for the ETI such as:
    #  define TPETRA_MULTIVECTOR_INSTANT(SCALAR,LO,GO,NODE) \
    template class MultiVector< SCALAR , LO , GO , NODE >; \
    template MultiVector< SCALAR , LO , GO , NODE > createCopy( const MultiVector< SCALAR , LO , GO , NODE >& src);

    ... Can binder work around the macros in this context and understand the fact that the .cpp files do the ETI?

-- yes, Binder input (say all-includes.hpp) is treated as regular C++ include file so all rules of processing C++ macros is applied and followed. As i mentioned above: you will probably add inline function definition in such macros to make sure that your templates is fully instantiated. (see answer above for details).

  1. In the source codes, we have some classes, functions, and includes that are guarded by some preprocessing checks. For example, if some TPL is included some classes can be declared and defined, otherwise it is not the case. When calling binder, is it possible to process those #ifdef checks and drop the content of the .hpp file between the begin and the end of the check if not defined?

-- absolutely, all Binder input processed as standard C++ include files including C++ macros.

  1. How can we make the binding process iterative by enabling one package at a time using binder instead of using all the files at once and still have some dependencies between packages?

-- doing this "perfectly" will be tricky since you will need to keep track of what types each project bind. I do not have a good solution how to do so (for my main project (3M LOC) i decided to just bind it all at once, - it is a huge library though). But if your Python users do not mind reasonable amount of memory overhead (and if most of the time users will only be importing some projects) then easiest way might be to just generate bindings for each project independently. Importing such projects at-once will lead to some amount of memory overhead since we will have binding code duplication for all overlapped types. But if this if this acceptable then it might be the way to go. Before deciding on this i would recommend to double check with minimal example (or maybe contacting Pybind11 team) that bindings same type twice will be treated as "same type" on Python level (it should, assuming that the same compiler and Pybind11 version is used).

  1. In Trilinos, we use a class of the project as an implementation of the shared pointer. I successfully used this class using PYBIND11_DECLARE_HOLDER_TYPE with PyBind11 on a smaller project. Is there a way to specify an extract shared pointer class to binder?

-- right now this is not supported and std::shared_ptr is hardcoded into Binder code. It should be trivial to add config option to make this configurable and PR that add such functionality will be welcome!

Hope this helps,

kliegeois commented 2 years ago

Thanks a @lyskov for your answers!

I have been able to make some progress using your comments. Now, I will take a look on how to add a config option to make the shared pointer type configurable.

kliegeois commented 2 years ago

@lyskov is there a way to use Binder with multiple translation units?

Let's say that I have file_1.hpp, file_2.hpp, and file_3.hpp, can I call Binder on the different files one by one, then gather the outputs together? I am currently using some input files that are automatically generated and that were not designed to be included together into one translation unit. Using one file one by one works but of course it prevents to access the code associated to the other files.

lyskov commented 2 years ago

@lyskov is there a way to use Binder with multiple translation units?

@kliegeois currently Binder does not support such workflow.

kliegeois commented 2 years ago

All the questions have been answered and the PR https://github.com/RosettaCommons/binder/pull/217 associated to the custom shared pointer has been merged so I am closing this issue.