Fluorescence-Tools / LabelLib

Library for coarse-grained simulations of probes flexibly coupled to biomolecules.
Mozilla Public License 2.0
9 stars 3 forks source link

Mirate pybin11 to swig #4

Closed tpeulen closed 5 years ago

tpeulen commented 5 years ago

I would like to make the LabelLib compatible with the integrative modeling (IMP) toolkit. The IMP approach of wrapping C/C++ functions and classes is through SWIG. Hence, I'd like to migrate the pybind11 bindings to SWIG.

mdimura commented 5 years ago

I think, there is no need to absorb LabelLib in the IMP, LabelLib can and should stay available as a standalone library/module. For this reason I don't see, why LabelLib and IMP should use the same python bindings? Why not continue using pybind11 API for LabelLib and SWIG for IMP? Are they somehow incompatible? Is it necessary to change APIs/dependencies of all libraries IMP integrates with? Also, pybind11 is newer, more progressive and, in my opinion, more convenient and better technology for c++/python bindings. I considered SWIG at some point, but pybind11 appeared as more powerful and at the same time much easier to use, simpler, cleaner and "prettier" option. pybind11 doesn't provide bindings to non-python languages, but it is less important for LabelLib in my opinion. SWIG implements its own C++ parser which can not keep up with the new standards, STL and popular libraries like Boost or Eigen. Pybind11 is a header-only library, relying on C++ template programming. It is a much more lightweight and robust approach, which benefits from the sophisticated and powerful modern C++ compilers. With pybind11 there is not custom wrapper language, just normal C++. Pybind11 allows more flexibility in how the C++ API should be exported to python, i.e the way pointers and references are handled, numpy/list choice, names etc. are all in the hand of programmer and are easy to change without changing the C++ code. This allows for Pybind11 wrappers to be be more "pythonic" in some cases.

tpeulen commented 5 years ago

Having already programmed wrappers using both pybind11 and SWIG I am very well aware of the differences. I agree pybind11 is more elegant than SWIG.

However, we should aim for maximum compatibility and not beauty. Meaning, we should introduce as little dependencies as possible and prefer established libraries over new libraries and old C++ standard over new.

I moreover agree there is no need to absorb LabelLib in IMP. LabelLib should remain independent. However, as you know, maintaining an extensive software package with external dependencies is a tough task. Hence, dependencies on external libraries through linked libraries are suboptimal.

I plan to build labellib and IMP on the same platform with the minimum amount of linked libraries. Ideally only using SWIG, Boost, Eigen.

mdimura commented 5 years ago

That's the buty of pybind11, you don't need to link against it. Pybind11 is shipped as a part of labellib code completely transparent for users. Once labellib is compiled, libraries that depend on labellib do not care about pybind11, it's completely transparent. There's never a necessity to install pybind11. Pybind11 is mature enough to be considered established and not yet obsolete or outdated. Labellib doesn't depend on Boost at all. Pybind11 is not a transitional dependency.

With respect to new c++ standards the advantages they provide are too big. Since standards are backwards compatible there is usually no problem switching to new standard, unless you need to build for a platform where new compilers are not available.

tpeulen commented 5 years ago

Exactly, as C++ standards are backward compatible, we should stick to the oldest possible C++ standard that is still in productive use. This way, other software may include labellib without the use of precompiled libraries.

Clear, labellib does not depend on boost. However, by using the boost library, dependencies on "pcg-cpp", Eigen, and pybind11 could be removed. Maybe at the cost of performance (Boost random generator vs. pcg-cpp).

I prefer a more self-contained library over syntactic sugar.

mdimura commented 5 years ago

It seems, c++11 is the least common denominator, which is perfectly fine for LabelLib, we can stick to that. What I am saying is, that in case of pcg-cpp and pybind11, I see absolutely no disadvantage for anyone to have them in LabelLib. It does not make compilation/build/linkage any more complicated at all. LabelLib does not carry pcg-cpp, and pybind11 dependencies in any practical sense. In C++ the dependencies burden relates almost exclusively to shared/static "binary" libraries. Pcg-cpp, and pybind11 are header-only libraries. In terms of shipping code there is no difference between pcg-cpp and adding an extra class/function inside LabelLib. Even API does not change, there is no difference between building/shipping LabelLib and LabelLib with pcg-cpp inside it. If we simply copy pcg-cpp headers inside the LabeLib/src dir, technically, nothing will change, it will look like if we implemented our own rand(). No extra dependencies for third-party software, that uses LabelLib. For that reason I don't understand, what's the advantage to remove pybind11? What it improves practically? Boost on the other hand is hard to install and is a huge dependency, not to mention the Boost version conflicts problems. For this reason it is considered good practice nowadays to avoid Boost as dependency if possible, and use problem-free small contained header-only libraries. Boost is not a self contained library, some parts of it have to be linked against. LabelLib as it is now is fully self-contained library. In order to use LabelLib user only needs to include <FlexLabel.h> and link -lFlexLabel, no extra transitional dependencies, pcg-cpp is completely hidden within the FlexLabel.dll/so. The fact that there is header-only code in "thirdparty" folder instead of "src" folder does not really make LabelLib less contained. Now Eigen is a slightly different story. Eigen is also header-only, but in the case of LabelLib it can not be considered completely contained, because Eigen classes are exposed in public functions of LabelLib at the moment. This means third-party software like IMP would not have to link against Eigen in order to use LabeLib, but Eigen header files must be available whenever LabelLib C++ API is used. For python API no Eigen is required at all, even now. This is an easy requirement to fulfill, nothing more but Eigen folder in INCLUDE_DIR. Even that transitional Eigen dependency can be avoided if we switch from Eigen::MatrixXf in API to std::vector. I personally don't think it is the best solution, but we can discuss it if you think it's worth it. It is easy to implement. What we discussed before is the opposite: to remove what's left from the std::array and std::vector from the LabelLib API and switch to Eigen:: and its numpy counterparts everywhere. This is easy do implement, if we decide so. Boost version of linear algebra code (uBlas) is not good enough. Eigen on the other hand is close to an industry standard.

tpeulen commented 5 years ago

My point was only to reduce the number of external dependencies, as I wish to aim for "long-term" stability.

Right now we have:

As you suggested, a solution may be just to simply copy the headers to labellib. For "pcg-cpp" this should be no problem as "pcg-cpp" is under the Apache /MIT license.

mdimura commented 5 years ago

I consider git submodule as a more explicit version of copying headers. Licenses of all three dependencies allow copying of sources, so that is not a problem, as you said. Let me know if you prefer normal copy to git submodule.