GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
246 stars 65 forks source link

Rework simplex tree options #978

Closed VincentRouvreau closed 8 months ago

VincentRouvreau commented 9 months ago

Fix #905

VincentRouvreau commented 9 months ago
@hschreiber @mglisse Some benchmarks to decide what options we do activate for the Python interface link_nodes_by_label stable_simplex_handles OFF filename Rips max_edge_length Rips construction in sec Rips construction memory in Mo SimplexTree construction in sec SimplexTree construction memory in Mo SimplexTree number of simplices Persistence computation in sec Persistence computation memory in Mo
false false tore3D_300.off 3.5 0.002 1.84 1.152 1859.43 57770908 62.260 14228.86
true false tore3D_300.off 3.5 0.002 1.85 1.878 2741.14 57770908 67.564 14228.80
false true tore3D_300.off 3.5 0.002 1.81 3.412 3622.38 57770908 78.363 14228.73
true true tore3D_300.off 3.5 0.002 1.80 4.342 4503.83 57770908 81.205 14228.78
link_nodes_by_label stable_simplex_handles OFF filename Rips construction in sec Rips construction memory in Mo SimplexTree construction in sec SimplexTree construction memory in Mo SimplexTree number of simplices edge collapse construction in sec edge collapse construction memory in Mo edge collapse number of simplices expansion(5) construction in sec expansion(5) construction memory in Mo expansion(5) number of simplices Persistence computation in sec Persistence computation memory in Mo
false false 5-forex-2500.off 0.151 150.76 0.076 103.18 3126250 14.341 6.08 9304 0.001 0.00 20925 0.005 0.00
true false 5-forex-2500.off 0.152 150.75 0.155 151.65 3126250 14.141 5.60 9304 0.001 0.00 20925 0.005 0.00
false true 5-forex-2500.off 0.153 150.73 0.245 190.84 3126250 14.407 4.74 9304 0.002 0.00 20925 0.006 0.00
true true 5-forex-2500.off 0.150 150.72 0.278 238.53 3126250 14.293 6.07 9304 0.002 0.00 20925 0.006 0.00
link_nodes_by_label stable_simplex_handles OFF filename Alpha precision Alpha construction in sec Alpha construction memory in Mo SimplexTree construction in sec SimplexTree construction memory in Mo SimplexTree number of simplices Persistence computation in sec Persistence computation memory in Mo
false false 5-forex-2500.off fast 2.041 48.97 3.364 212.66 2278347 1.615 17.99
false false 5-forex-2500.off safe 2.013 49.80 7.064 664.78 2278347 1.644 0.55
false false 5-forex-2500.off exact 1.954 49.73 23.352 720.63 2278347 1.585 14.16
true false 5-forex-2500.off fast 2.113 49.00 3.802 249.55 2278347 1.790 17.99
true false 5-forex-2500.off safe 1.831 49.76 7.330 701.88 2278347 1.925 0.55
true false 5-forex-2500.off exact 1.900 49.71 23.676 757.68 2278347 1.776 0.55
false true 5-forex-2500.off fast 2.056 48.96 4.041 278.04 2278347 1.991 17.92
false true 5-forex-2500.off safe 1.903 49.75 7.582 730.27 2278347 2.032 0.54
false true 5-forex-2500.off exact 1.834 49.78 23.621 786.18 2278347 1.966 13.43
true true 5-forex-2500.off fast 2.067 48.95 4.155 312.95 2278347 2.121 17.93
true true 5-forex-2500.off safe 1.820 49.74 7.648 765.29 2278347 2.121 0.55
true true 5-forex-2500.off exact 1.817 49.70 23.759 821.02 2278347 2.153 14.87
mglisse commented 9 months ago

For stable_simplex_handles, I think we said there was no point activating it in Python since for now it has 0 benefit. For link_nodes_by_label, I think I'd be in favor of keeping it false for now (same behavior as previous releases) and possibly revisiting later. My reasoning is that it increases the size of the SimplexTree significantly (+50%), and while it makes finding cofaces less ridiculously slow on sparse complexes, that operation is still way slower than finding faces, so people who need really fast cofaces would be better off using a different datastructure that explicitly stores the links of the Hasse diagram. I am open to counter-arguments though.

hschreiber commented 8 months ago

Yes, stable_simplex_handles should be false for python, as the user never accesses the simplex handles directly, so it slows down things for nothing. For link_nodes_by_label, I admit, I don't know how much python users need cofaces when using the simplex tree. Do we have feedback on this?

DavidLapous commented 8 months ago

A possibility, which is not headache-free, is to have both in the same simplextree : as the current python interface only stores the simplextree c++ pointer, we just need to add a private member in the python simplextree class, telling by run time how to reinterpret_cast the simplextree in c++. This may cost another python call for each method, but it should be fine. To avoid confusion of python users, it might also be useful to define __repr__ to differentiate them.

This can also apply to PR #976 .

mglisse commented 8 months ago

For link_nodes_by_label, I admit, I don't know how much python users need cofaces when using the simplex tree. Do we have feedback on this?

A search in the issues and gudhi-users finds surprisingly little. I know some people have used cofaces on examples small enough that they did not care about performance. The few cases I remember where people cared about the speed of cofaces were for C++.

A possibility, which is not headache-free, is to have both in the same simplextree : as the current python interface only stores the simplextree c++ pointer, we just need to add a private member in the python simplextree class, telling by run time how to reinterpret_cast the simplextree in c++.

So essentially a std::variant or a virtual base (though we may use other technical means to achieve it). That's a possibility. It has a cost though, both in speed (although the difference may be negligible depending on how we implement it) and maintenance (it increases and complicates the code), so it needs to be worth the trouble. But if we do decide that we want "fast" cofaces without forcing a slowdown on other operations and increasing the use of memory, that does look like a sensible approach.

For multi-persistence, while it still seems possible, my first reaction without thinking much about it is that the interfaces are likely to differ enough that a separate Python class makes more sense.

VincentRouvreau commented 8 months ago

We will need to say something in the release notes about it.

Ok, I will propose something