giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
858 stars 175 forks source link

Add normalization of entropy #450

Closed wreise closed 4 years ago

wreise commented 4 years ago

Reference issues/PRs

Types of changes

Description Add the option to normalize the entropy returned by PersistenceEntropy, according to the heuristic proposed by Myers et Al. (2019) in eq. 4. Up to now, the calculated entropy wass the one in eq. 3.

Screenshots (if appropriate)

Any other comments? Fix indentation in mapper visualization test.

Checklist

wreise commented 4 years ago

I hope I addressed all the issues!

ulupo commented 4 years ago

I think we should edit the code here and anywhere else we use entropy (in particular gtda/mapper/filter.py for Entropy and gtda/time_series/features.py for PersistenceEntropy) to use scipy.stats.entropy. It is suitable for 2D arrays and arbitrary bases, handles zeros well (so we don't have to silence warnings), and is a one-liner!

There is a side-effect in the particular case of PersistenceEntropy because we then have to calculate the quantity now called lifespan_sums again (and this is technically computed inside the code for scipy.stats.entropy).

wreise commented 4 years ago

Ups, there is one failing test gtda/tests/test_pipeline.py::test_grid_search_time_series. Not ready yet.

MonkeyBreaker commented 4 years ago

@wreise PR #451 resolves the current building issues on the pipeline.

You encounter issues because the pipeline download latest master version of pybind11, and their latest changes broke compilation even on my local machine. I'm currently looking for a solution but on the meantime, the compilation pipelines work again :)

ulupo commented 4 years ago

@wreise I think the failure in the tests exposes the following issues:

What do you think?