[RFC] Replace current ffi with pybind11

VoVAllen commented 5 years ago

Proposal

pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code.

Many projects now use this, including:

PyTorch
- PyTorch migrated from cffi to pybind11 in 1.0 release
Tensorflow
- Tensorflow started migrating from swig to pybind11 recently, and plan to finish this by the end of 2020 Q1 (link)
Others
- nvidia/dali
- rapidai
- Actually it's hard to find a project not using pybind11

Pros of pybind11 over current ffi:

Bigger community
- pybind11 is widely used. Easier to find solution if we meet problems.
Better composibility with other framework
- Since tf and pytorch is/will use pybind11, this would make it easier to interact with them
Everything is compiled
- The functions are compiled to be called directly. Current ffi needs extra python calls with ctypes to dispatch/convert to pointers. Potentially it might could bring us better performance (not sure).
Better container support
- pybind11's py::list has the same API as Python, which is powerful than current List
External developer friendly
- Pybind11 has detailed docs, and more friendly interfaces. This would make external developer easier to contribute.
Better graduanlity control
- Current ctypes function release and acquire GIL for every function call. This would lead to performance issue when calling a light function massively, if we don't need to release GIL. pybind11 can release/acquire GIL on-demand in C++.

Cons of pybind11 over current ffi:

Needs extra work to change ffi
Compilation time may increase, and depends on Python API and OS
- pybind11 compiles codes to be called directly, which depends on specific Python version. Current ffi does not depend on Python API and OS, because ctypes handles those things
- Need to compile for different python version separately when release

Generally speaking, pybind11 is easier to use, and possibly could bring us better performance, due to GIL issue and bypass multiple python calls comparing to current implementations.

BarclayII commented 5 years ago

I would say the major pro of pybind11 is composibility with any package that uses pybind11, including custom-developed packages. This essentially allows users to develop a C routine and directly "plug in" to DGL without compiling the full DGL source (like PyTorch-scatter plugging into PyTorch).

I think this is particularly important for customized neighborhood sampler implementation, as allowing easy integration of third-party C neighborhood samplers should be a good idea.

classicsong commented 5 years ago

I would say the major pro of pybind11 is composibility with any package that uses pybind11, including custom-developed packages. This essentially allows users to develop a C routine and directly "plug in" to DGL without compiling the full DGL source (like PyTorch-scatter plugging into PyTorch).

I think this is particularly important for customized neighborhood sampler implementation, as allowing easy integration of third-party C neighborhood samplers should be a good idea.

It should be good for any kind of third-party C samplers (neighborhood, negative, etc.)

jermainewang commented 5 years ago

Changing FFI requires quite a bit effort, so if it does not block any usability feature, I would turn it down at the moment. Could anyone describe the user experience of writing a custom C routine if DGL were to use pybind11? E.g., what are the dependencies? how does it plug into DGL?

VoVAllen commented 5 years ago

pybind11 has similar API of registering as the current one


#include <pybind11/pybind11.h>

int add(int i, int j) {
    return i + j;
}

PYBIND11_MODULE(example, m) {
    m.doc() = "pybind11 example plugin"; // optional module docstring

    m.def("add", &add, "A function which adds two numbers");
}

and can be called in python like

import example
example.add(1, 2)

It's a header-only library, and can be added to current cmake easily. One main reason for this as mentioned by @BarclayII, is that we hope to enable user to write their own C++ sampling algorithm when needed, without compiling all the DGL codes, as how pytorch did for c++ custom op.

I think our code base is not so big that we could afford this transformation. PyTorch and tf had much bigger code bases and they still decided to do so. One thing I think needs investigation is how to make this transformation implemented gradually, which makes it easier to debug and ensure the correctness.

jermainewang commented 5 years ago

is that we hope to enable user to write their own C++ sampling algorithm when needed, without compiling all the DGL codes, as how pytorch did for c++ custom op.

Could you give a step-by-step example? For example, does it only require a DGL header library? Does it require link to DGL library during compilation? If DGL is installed by conda/pip, how does it work?

BarclayII commented 5 years ago

I guess a more appropriate example worth inspecting would be: how to, and whether it is worthwhile (in terms of performance overhead maybe) to replace any current TVM-style binding (e.g. _CAPI_DGLGraphHasEdgesBetween) with PyBind11.

I found the above non-trivial to do at the first glance as it at least requires thinking of (1) how to expose Graph objects, and (2) how to deal with NDArray objects.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

dmlc / dgl