glotzerlab / freud

Powerful, efficient particle trajectory analysis in scientific Python.
https://freud.readthedocs.io
BSD 3-Clause "New" or "Revised" License
282 stars 49 forks source link

Convert Box Module to Use nanobind #1256

Closed tommy-waltmann closed 5 months ago

tommy-waltmann commented 5 months ago

Description

This PR changes the box module to use pybind11 instead of Cython. It also establishes a new pattern in the cmake code which other modules will be able to follow for when they are converted.

Motivation and Context

Cython is old and causing us a lot of developer headaches, so we are converting freud to use pybind11 instead.

How Has This Been Tested?

All old tests pass. Please build freud from source and test the box module manually for now.

Types of changes

Checklist:

vyasr commented 5 months ago

If you're considering switching away from Cython you might want to consider nanobind instead. The benchmarks look quite good and the API is nearly identical.

joaander commented 5 months ago

If you're considering switching away from Cython you might want to consider nanobind instead. The benchmarks look quite good and the API is nearly identical.

Thanks for sharing, I wasn't aware of nanobind. It looks perfect for what we do.

... and let the endless cycle of rewriting our codes continue ...

tommy-waltmann commented 5 months ago

If you're considering switching away from Cython you might want to consider nanobind instead. The benchmarks look quite good and the API is nearly identical.

The ndarray class looks to be great for interoperatibility between python and C++, which will be perfect for freud!

DomFijan commented 5 months ago

I was only able to go through a small portion of PR. Will give it another go soon.

tommy-waltmann commented 5 months ago

This looks good. It seems that, besides changes on the python side, we only have to change the data types to the nanobind array on c++ side + some minor tweaks.

One thing that I don't understand is why there are several sets of cpp functions such as makefraction and makefractionalPython. I can see that the "python" functions are used for exporting to python, but why is there a need for the non "python" function? Couldn't the logic be condensed into a single function? it seems that the main difference between the two functions is that non-python functions operate on a single set of data while the python functions do that for the whole system. Could we have either a single function for both use cases or if that is not possible, better names would also help (ie calling them _single vs. whole system or something like that?)

Generally speaking, there are 3 kinds of functions:

1) Function that operates on a single particle 2) Function that operates on the whole system and take raw pointers as inputs 3) Function that operates on the whole system and takes nb::ndarray as inputs, name ends with Python suffix

Type (1) exists to increase code readability and reduce code duplication (they are re-used sometimes in multiple places).

Type (2) is largely leftover from cython-era freud, but could provide a C++ interface for the box class if at some point we decide to provide that. If we don't care about preserving something that could be easier to turn into a C++ API later, we could remove this type of function.

Type (3) is so C++ and python can talk to each other with nanobind. When exporting functions, both pybind11 and nanobind cannot resolve two functions with the same name, so the suffix Python is added to the name.

joaander commented 5 months ago

One solution is to implement the first 2 methods in Box.h and move then 3rd into the nanobind module export file. This compartmentalizes the C++-only code in libfreud.so for future use by C++ libraries yet makes the Python API available to Python users.

The 3rd function in this case wouldn't be a member function. It could either be a static function that behaves like a member function (taking a shared_ptr<Box> as the first argument) - or implemented in a lambda within the nanobind .def.

tommy-waltmann commented 5 months ago

One solution is to implement the first 2 methods in Box.h and move then 3rd into the nanobind module export file. This compartmentalizes the C++-only code in libfreud.so for future use by C++ libraries yet makes the Python API available to Python users.

The 3rd function in this case wouldn't be a member function. It could either be a static function that behaves like a member function (taking a shared_ptr<Box> as the first argument) - or implemented in a lambda within the nanobind .def.

Other than cluttering libfreud.so with methods that we don't recommend using, what is the downside of the 3rd function type existing in libfreud.so?

I haven't separated out the nanobind export modules (i.e. _box.cpython...) from the C++ library (libfreud.so) on this PR. I do this in the parallel module branch (which is also ready to go once this PR is merged), so it may be easier to wait until the next PR to make those changes.

joaander commented 5 months ago

One solution is to implement the first 2 methods in Box.h and move then 3rd into the nanobind module export file. This compartmentalizes the C++-only code in libfreud.so for future use by C++ libraries yet makes the Python API available to Python users. The 3rd function in this case wouldn't be a member function. It could either be a static function that behaves like a member function (taking a shared_ptr<Box> as the first argument) - or implemented in a lambda within the nanobind .def.

Other than cluttering libfreud.so with methods that we don't recommend using, what is the downside of the 3rd function type existing in libfreud.so?

I haven't separated out the nanobind export modules (i.e. _box.cpython...) from the C++ library (libfreud.so) on this PR. I do this in the parallel module branch (which is also ready to go once this PR is merged), so it may be easier to wait until the next PR to make those changes.

Certainly. If we decide to take this approach, this is a rearrange we could do after the next PR.

Other than clutter, one possible downside I see is that libfreud.so may need to be linked to the nanobind.so shared object (I'm not sure yet, as I haven't tried this yet). We would have to figure out how to do this without nanobind_add_module.

Even if the parts of nanobind needed by libfreud.so are header-only, this forces C++-only downstream users of freud to have a Python environment with nanobind installed for those headers.

It is really a question of how strongly do we want to support potential C++-only use-cases. If we think there is even a small chance users would like this in the future, it will be worth the small amount of extra effort during this rewrite. Refactoring the cluttered libfreud.so in the future will be a much larger project.

tommy-waltmann commented 5 months ago

It is really a question of how strongly do we want to support potential C++-only use-cases. If we think there is even a small chance users would like this in the future, it will be worth the small amount of extra effort during this rewrite. Refactoring the cluttered libfreud.so in the future will be a much larger project.

Ok, then let's do this on the next PR.