TopoToolbox / pytopotoolbox

Python interface to TopoToolbox
https://topotoolbox.github.io/pytopotoolbox/
GNU General Public License v3.0
1 stars 2 forks source link

Interface with libtopotoolbox through scikit-build-core and pybind11 #12

Closed wkearn closed 3 months ago

wkearn commented 3 months ago

The current approach using ctypes requires separately building and installing libtopotoolbox in an appropriate directory, which is tricky to get right in a cross-platform way without substantial intervention from users. This set of changes introduces a new build system using scikit-build-core and pybind11 to build an extension module that includes an interface to our library.

pyproject.toml is modified to switch the build backend to scikit-build-core to install pybind11 as a build-time dependency.

src/lib.cpp contains the bindings for libtopotoolbox, defined using pybind11. The PYBIND11_MODULE declaration exposes a Python module _lib within the topotoolbox package that contains the bindings. Right now, it only binds has_topotoolbox, which has a trivial binding because it requires no marshalling of Python data into C/C++ types. Other functions such as fillsinks will require more complex binding declarations, but pybind11 provides an interface to working with NumPy arrays that should be helpful.

CMakeLists.txt is used by scikit-build-core to build the package. It follows the guidance at https://scikit-build-core.readthedocs.io/en/latest/getting_started.html with the exception of adding libtopotoolbox using FetchContent. This will download the source from GitHub and then build a /static/ library for libtopotoolbox and link it into the /shared/ library that is built by pybind11.

src/topotoolbox/init.py shows how to access the binding from the _lib module. In practice, we probably don't want to expose the raw bindings from the topotoolbox module but instead use them to implement methods within the proposed GridObject class.

Next steps

It should be possible with these changes to run pip install . from the pytopotoolbox repository and have everything built and installed properly.

I am not committed to this approach being the way forward. On the positive site, it seems to work and simplify things a lot for the user. What I have read suggests that the C extension approach generally performs better than using ctypes, though I don't think we would really see the effects of performance problems yet. I also think using pybind11 gives us a lot more control over how data are passed into and out of the library that might come in handy as things start to get more complex.

On the negative side, it does require us to maintain the bindings in src/lib.cpp alongside our C library and the Python package: the bindings are not automatically generated from the libtopotoolbox header file or anything like that. It would be instructive to implement the pybind11 bindings for a more complicated function like fillsinks to see whether this becomes a real pain to develop and maintain.

@Teschl what are your thoughts?

If we decide to go this route, we should probably merge #11 first, then rebase this on those changes. Then we can add the fillsinks method to GridObject and create a binding for it in src/lib.cpp.

If merged, this solves #10.

Teschl commented 3 months ago

I agree, the next step should definitely be to implement the pybind for fillsinks. As of right now, I don't understand the functionality completely, especially how to pass the array from the python class to C++. I think we should merge #11 anyway, so that we have a better base to move forward. The name of Gridobject is something we should be able to change very quickly if the need arises.

Teschl commented 3 months ago

I have now built a version that includes fillsinks with pybind11. I think we should definitely use it over ctypes, just for the ease of building with pip. In order to pass arrays to our libtopotoolbox, we have to create a wrapper for every function, so that we can pass the pointer properly. That means the lib.cpp file will probably get quite big; I will look into separating it into a few (maybe one file for GRIDobj, FLOWobj, ...) files. Other than that, it's very simple to use, which is great.

@wkearn, I was thinking of adding the fillsinks function to the #11 pull request, since I based it on that file structure it would make things very easy. What do you think?

wkearn commented 3 months ago

Excellent work, Theo!

Let's keep the fillsinks stuff and #11 as separate pull requests for now just to keep things a little better organized. Sorry for not getting around to merging #11. EGU this week has thrown off my schedule. I will try to get it merged later today, and then merge the build system changes proposed here (#12), which requires a few changes to be compatible with #11. Then we can work on getting your fillsinks binding merged. It is probably better to hold off on submitting a pull request for fillsinks until I merge these.

If you separate out the files, does pybind11 create individual modules (i.e. grid, flow, etc.)? Or can you make submodules (_lib.grid, _lib.flow)? Or is there a way to expose everything under one namespace? Some of the C functions might not clearly belong to GRIDobj, FLOWobj, or whatever. Maybe it would be better to copy the file structure of the C library (so we start with fillsinks.cpp here). We can figure that out as we move forward, though.

On Thu, 18 Apr 2024 04:17:31 -0700 Theo @.***> wrote:

I have now built a version that includes fillsinks with pybind11. I think we should definitely use it over ctypes, just for the ease of building with pip. In order to pass arrays to our libtopotoolbox, we have to create a wrapper for every function, so that we can pass the pointer properly. That means the lib.cpp file will probably get quite big; I will look into separating it into a few (maybe one file for GRIDobj, FLOWobj, ...) files. Other than that, it's very simple to use, which is great.

@wkearn, I was thinking of adding the fillsinks function to the #11 pull request, since I based it on that file structure it would make things very easy. What do you think?

-- Reply to this email directly or view it on GitHub: https://github.com/TopoToolbox/pytopotoolbox/pull/12#issuecomment-2063624240 You are receiving this because you were mentioned.

Message ID: @.***>

wkearn commented 3 months ago

@Teschl: #11 and #12 are now merged, so feel free to make a pull request for your fillsinks binding whenever you are ready.

Teschl commented 3 months ago

Great, will do!