cogent3 / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
1 stars 0 forks source link

Evaluate pybind11 versus Cython #26

Open khiron opened 1 year ago

khiron commented 1 year ago

From this blog post and this article.

khiron commented 1 year ago

The required task is to extend a C++ project to be python callable

graph TD;
  subgraph Application
  A[Python application]
  end
  subgraph Library
  Z[C++ library];
  end
  A--"cross process boundary"-->Z;

Cython compiles python to C++, and further compiles C/ C++ code in a "python like" special syntax into C/C++. Cython does not automatically marshall python data types to and from into C/C++ data types inside the C++ project, so strings have to be encoded into 'utf-8'. Objects have to be serialized using JSON at the call, and deserialized at the other side of the process boundary.

Also a cython wrapper is subject to namespace collisions, so if the C++ shared library exports a function named phylogenetic_analysis, the python version needs to be named something different like phylogenetic_analysis_py

Cythons primary role is to embed C++ code into a python project. It can be used to make the call to a C++ shared library a little easier than doing the same thing from native python but what it offers is minimal.

graph TD;
  subgraph Application
  A[Python application]
  B[manual marshalling and serialize objects]
  C[Cython code]
  A-->B
  B-->C
  end
  subgraph Library
    X[library interface]
    Y[deserialize objects]
    Z[C++ library];
    X-->Y
    Y-->Z
  end
  C--"cross process boundary"-->X;

pybind11 wraps C++ functions and types into python callable functions that take and return python data types. pybind also can map

graph TD;
  subgraph Application
  A[Python application]
  B[pyBind11 wrapper on classes and functions]
  A--pybind11 marshalling-->B
  end
  subgraph Library
  X[library interface]
  Z[C++ library];
  X-->Z
  end
  B--"cross process boundary"-->X;

Cython doesn't appear to provide assistance on the python side to marshal data structures to cross the process boundary, and potentially would still require a solution like pybind11 to do that. Pybind11 alone appears to be able to present a python interface inside a python project to an external library that requires native data structures.

khiron commented 1 year ago

Sample C++ mock iqtree2 library

#include "mock_iqtree.h"

std::string phylogenetic_analysis(
    const std::string& alignment_file, 
    const std::string& partition_file, 
    const std::string& tree_file, 
    const std::string& out_prefix, 
    int num_threads, 
    int seed
) {
    return "42";
}

Sample pybind11 application

Wrapper

#include <pybind11/pybind11.h>
#include "../Cpp_lib/mock_iqtree.h"

PYBIND11_MODULE(pybind_wrapper, m) {
        m.doc() = "mocked iqtree example";

        m.def("phylogenetic_analysis", &phylogenetic_analysis, "A function to perform phylogenetic analysis");
}

Setup.py

from setuptools import setup, Extension
import pybind11
setup(
    name='pybind_wrapper',
    ext_modules=[
        Extension(
            'pybind_wrapper',
            sources=['pybind_wrapper.cpp', '../Cpp_lib/mock_iqtree.cpp'],
            include_dirs=['.', '../Cpp_lib',pybind11.get_include()],
            library_dirs=['../Cpp_lib/build/Release'], 
            libraries=['mock_iqtree'], 
            language='c++'
        ),
    ],
)

Sample Cython application

Wrapper

# distutils: language = c++

from libcpp.string cimport string

cdef extern from "../cpp_lib/mock_iqtree.h":
    string phylogenetic_analysis(string, string, string, string, int, int)

def phylogenetic_analysis_py(str aln_file, str partition_file, str tree_file, str out_prefix, int num_threads, int seed):
    cdef string result = phylogenetic_analysis(aln_file.encode('utf-8'), partition_file.encode('utf-8'), tree_file.encode('utf-8'), out_prefix.encode('utf-8'), num_threads, seed)
    return result.decode('utf-8')

Setup.py

from setuptools import setup
from Cython.Build import cythonize
from setuptools.extension import Extension
import os
if os.name == 'nt':  # Windows
    compile_args = ['/std:c++11']
else:
    compile_args = ['-std=c++11']

setup(
    name='cython_wrapper',
    ext_modules=[
        Extension(
            "cython_wrapper",
            sources=["cython_wrapper.pyx", "../cpp_lib/mock_iqtree.cpp"],
            language="c++",
            language_level=3,
            extra_compile_args=compile_args,
        )
    ]
)