GUDHI / gudhi-devel

The GUDHI library is a generic open source C++ library, with a Python interface, for Topological Data Analysis (TDA) and Higher Dimensional Geometry Understanding.
https://gudhi.inria.fr/
MIT License
245 stars 65 forks source link

Tomato KDE underflow #1060

Closed mglisse closed 2 weeks ago

mglisse commented 1 month ago

In tomato, with 'logKDE', we get values from KernelDensity.score_samples and they work fine. With 'KDE', in order to give more importance to higher densities, we naively compute the exponential of these values. However, if the values are all around -1000 (high dimension), the exponential will just send everything to 0 (underflow). I think it would make sense to renormalize first, i.e. add a constant to all scores before computing the exponential (that just multiplies the weights by a constant, which in theory (no underflow) has no impact on tomato except that it changes the numbers on the axes in the diagram plot). The question is then what that constant should be: the maximum score? something closer to the average/median? And how do we document that?