PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.11k stars 5.55k forks source link

New Python/C++ interface #2750

Closed wangkuiyi closed 7 years ago

wangkuiyi commented 7 years ago

I read and followed this article http://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/c++-wrapping.html, which compares the following interfacing technology:

  1. manual wrapping. I followed this official Python document for more details: https://docs.python.org/2/extending/extending.html for some example programs. There includes some complex boilerplate code -- parsing argument in each C function, build and return Python object in each C function, and the method list.

  2. SWIG. It seems a general method that can generate bindings for various client languages, but not "native" enough. Also, it takes some time to learn the interfacing language (*.i files).

  3. ctypes. This requires us to respecify the return type and other meta-data about each C function at the Python side, again.

  4. SIP. This is the Qt community's version of SWIG. We also need to learn an interfacing language.

  5. Boost.Python. This is some C++ templates that simplify the manual wrapping. We no longer need to write a C wrapper function for each C++ function. Only a few extra lines in addition to the original C++ code are required to build a .so that can be called from Python.

I personally prefer Boost.Python. Here is an example for your reference:

Suppose that we already have C++ functions like:

char const* greet() {
   return "hello, world";
}

only the following few lines is required to build the Python-callable .so file:

#include <boost/python.hpp>

BOOST_PYTHON_MODULE(hello_ext) {
  boost::python::def("greet", greet);
}
Superjomn commented 7 years ago

boost.python is simpler than SWIG, but it seems that it only supports python while SWIG supports more language wrappers.

For sure, we may only support Python in a long time, consider time and workload, but some choices need to be considered:

In short, boost.python for creating better Python APIs like PyTorch while SWIG for multiple language wrappers.

typhoonzero commented 7 years ago

Boost.Python. This is some C++ templates that simplify the manual wrapping. We no longer need to write a C wrapper function for each C++ function. Only a few extra lines in addition to the original C++ code are required to build a .so that can be called from Python.

  1. We need muli-language wrappers, at least Python and Go, because we'll need to call Tensors and Ops in pserver side to do remote parameter optimization.
  2. A c wrapper is also needed for high-performance online inference.

So I think we can't avoid making a c wrapper at last, once we have it, implement python extension is simple.

wangkuiyi commented 7 years ago

What is supposed to be included in the Go API?

I my mind, the Go API differs from the Python API significantly. In particular, we don't need Go packages like paddle.op, paddle.layer, paddle.scope, paddle.variable. Is this correct? @typhoonzero

jacquesqiao commented 7 years ago

There are another lib https://github.com/pybind/pybind11 that just work like Boost.python, but is much more lightweight, If we will consider Boost.python, we can also have a look at this lib.

typhoonzero commented 7 years ago

What is supposed to be included in the Go API?

At least paddle.variable or paddle.tensor I think. The current implementation of go pserver and optimizer use an independent implementation of tensor, at paddle/optimizer/tensor.h, better to use the new implement so we don't have two "tensor"s in the code base.

We don't need Go packages like paddle.op, paddle.layer, paddle.scope, paddle.variable indeed. Making parameter server as an "op" like tensorflow isn't what we intended to.

dzhwinter commented 7 years ago

consider that we just write operator in C++ and generate them in python. We cannot find a proper language binding generator library/generate technical. Maybe it is too hard to generate OP for Go at present. I think we can just build a core system in c++, strong binding with python. other languages invoke functions from C-API binding. Agree with @jacquesqiao, according to pybind11's doc pybind11 pybind11 Similar to Boost.Python, but with a lean header-only implementation for C++11-capable compilers. It may be a better choice if we only consider the python c++ binding things.

ctypes is not a good choice. Not only we need to write every function again in python side, but also it makes python binding tedious to maintain/upgrade. e.g., mxnet, choose ctypes in the very beginning, but they maintain another logic in cython nowadays.

reyoung commented 7 years ago

Whether uses C-API depends on is there any other languages need invoke Paddle C++ Core or not.

I am not sure only Python API is enough or not. At least there are several needs for us to give a C-API.

Also, I think pybind11 is better than boost::Python because Paddle is in C++ 11.

But if we have a C-API for Paddle, wrap that C-API to Python is extremely easy by Cython.

cdef extern from "math.h":
    double sin(double x)
reyoung commented 7 years ago

Also, pybind11 and boost::Python has a very major defect. It enforces the compiling Python version and running Python version EXACTLY SAME. It means if Paddle is compiled with Python 2.7.2 but run with Python 2.7.3, an error will be raised.

See video here

reyoung commented 7 years ago

@wangkuiyi and all,

I write two demos, one used pybind11, other used Cython+C-API. They are:

The conclusion is:

reyoung commented 7 years ago

Fixed by #2793