GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
254 stars 65 forks source link

multiple instantiations of Simplex_tree hidden in SimplexTree #1003

Open mglisse opened 10 months ago

mglisse commented 10 months ago

From https://github.com/GUDHI/gudhi-devel/pull/978#issuecomment-1775428864 by @DavidLapous.

It would be possible to have the Python SimplexTree store a pointer (or contain a variant or...) to something that may, at runtime, be one of several instantiations of Simplex_tree, so we can choose if we want fast cofaces or not, etc, without creating many Python classes.

This would complicate the code and slow it down (although it might be negligible), but it is a possibility to keep in mind.

In a number of places we have a function create_simplex_tree which returns a new SimplexTree. We would then probably need to add an option to provide an existing SimplexTree or pass arguments to the constructor.

DavidLapous commented 10 months ago

A good option, that should not complicate the code, would be to use cython fused types, but they are not supported yet as class attribute cython/cython#3283

There may be some useful reading in the link above to achieve this

mglisse commented 10 months ago

cython fused types

IIUC, the goal with these would be to have classes SimplexTreeBasic, SimplexTreeFastCofaces, etc while writing only once the interface code in pxd/pyx files? That should also be doable in pybind11 if cython doesn't implement it. But it does have the drawback of having multiple python classes, so it is quite different from your previous suggestion of hiding everything under a single class (or I misunderstood one of the suggestions).

DavidLapous commented 10 months ago

From the link I mentioned, they link sklearn code that is able to do that ( by essentially generating pyx files for each template). This would indeed create a class for each template, but as they will have essentially the same methods, a wrapper class could interface this in python. A bonus point is that we wont need to do pyi files (signatures for python live documentation) for these simplextrees.

class SimplexTree:
    _backend
    def __init__(*args,**kwargs):
        # code to instanciate the good backend
        _backend = SomeSimplexTree
    def some_method(*stuff, **more_stuff):
        """ working doc """
        return _backend.method(*stuff, **more_stuff)

I'm not sure it's possible to hide this with c++ code only, unless all simplextree's method & data have exactly the same offset in memory (i.e. (c-style) decast recast to a different simplextree is valid); but this would involve very tricky-buggy c++ / assembly code to shift methods or jump in memory, so I'm not sure it's worth it. And no matter what, we will have to compile all of these functions, i.e. do something like my previous proposition (or use pybind11 if its possible there).

DavidLapous commented 3 months ago

Small update that I forgot to do: I've implemented this for my use case here. I didn't do the class thingy, but a function instead here; using Union types in the signature allows for documentation to be retrieved, so no major drawback.