Fix hypothesis for mapper nerve tests

wreise commented 2 years ago

Reference issues/PRs

Fixes the tests from #636 .

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Description I reduced the sizes of examples generated by hypothesis in nerve tests, by lowering the min_side parameter and adding a max_side parameter. I also introduced a safeguard against gaussian smoothing with sigma_pixel too large.

Checklist

[X] I have read the guidelines for contributing.
[X] My code follows the code style of this project. I used flake8 to check my Python changes.
[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[X] I have added tests to cover my changes.
[X] All new and existing tests passed. I used pytest to check this on Python tests.

wreise commented 2 years ago

I have trouble reproducing the errors from the failing tests hk_shape and hk_positive. Both tests do fail, but running the examples line by line executes normally.

wreise commented 2 years ago

The HeatKernel tests fail because there is a dimension, in which the diagram contains a single point, almost on the diagonal. Then, in heat_, step_size is very small and sampling covers a very narrow range. image is still correct and as expected. However, since sigma is set uniformly over dimensions, it might be orders of magnitude larger and the call to gaussian_filter fails.

It seems to be a bug in the logic: the bins are calculated per dimension, but a single sigma is used. @ulupo, what if we modify sampling to force it to be of the range of sigma? For example, we could make sure that we set the range to [min(x) + c*sigma, max(x) + c*sigma], for some c?

ulupo commented 2 years ago

Thanks for putting so much energy into chasing this bug @wreise! Your explanation is very clear, thanks!

I see what you suggest, and it makes sense. An alternative would be to use units of the bin size to choose sigma -- so that, in particular, the absolute value of sigma would differ by dimension automatically. So, as a user, I would set the number of bins (easy to understand since it's equivalent to setting the number of pixels) as well as the width of my Gaussians in those units, which should be fairly intuitive to grasp. What do you think?

wreise commented 2 years ago

That's an alternative I haven't considered! It amounts to thinking of sigma in the unit of "number-of-pixels", no?

I have one example in which it is convenient to have sigma in the units of the data. Consider classification of noisy point clouds, using ph in various dimensions. I think you'd want to choose sigma of the order of the noise. Although, the paper for persistence Images says that the variance doesn't matter...

I have a third option, which involves the least changes. Going back to the example I described in the previous comment with sigma huge and the range small, what you'd expect as a result of gaussian_filter is just a constant vector. I think I can come up with a numerically-safe gaussian_filter method, which would do just that. It amounts to choosing a threshold for sigma_pixel, above which you approximate gaussian_filter(x, sigma) with a constant vector. This threshold would be purely for numerical precision and, imo, should be hard coded. What would you say?

ulupo commented 2 years ago

Thanks again! Some good thoughts there. I like the idea of making the least changes and your suggestion seems smart to me. Please feel free to try it!

ulupo commented 2 years ago

There's still an HK test that fails (I think this is one of those flaky tests that used to fail in the past already). Could it be easy to solve with your latest technology? https://dev.azure.com/maintainers/Giotto/_build/results?buildId=3619&view=logs&j=f1a0a6a1-f9c7-5c0b-02a6-90f84662480b&t=bf059a84-475e-57cb-5aef-ee9999aa7f8f&l=478

ulupo commented 2 years ago

@wreise an issue with wheels for macOS was revealed by this PR since we were not able to get to the install wheels step in recent Azure runs! I hope my latest commit in this PR fixes things.

giotto-ai / giotto-tda

Fix hypothesis for mapper nerve tests #638