Closed wreise closed 2 years ago
I have trouble reproducing the errors from the failing tests hk_shape
and hk_positive
. Both tests do fail, but running the examples line by line executes normally.
The HeatKernel tests fail because there is a dimension, in which the diagram contains a single point, almost on the diagonal. Then, in heat_
, step_size
is very small and sampling
covers a very narrow range. image
is still correct and as expected. However, since sigma
is set uniformly over dimensions, it might be orders of magnitude larger and the call to gaussian_filter
fails.
It seems to be a bug in the logic: the bins are calculated per dimension, but a single sigma is used. @ulupo, what if we modify sampling
to force it to be of the range of sigma
? For example, we could make sure that we set the range to [min(x) + c*sigma, max(x) + c*sigma]
, for some c
?
Thanks for putting so much energy into chasing this bug @wreise! Your explanation is very clear, thanks!
I see what you suggest, and it makes sense. An alternative would be to use units of the bin size to choose sigma -- so that, in particular, the absolute value of sigma would differ by dimension automatically. So, as a user, I would set the number of bins (easy to understand since it's equivalent to setting the number of pixels) as well as the width of my Gaussians in those units, which should be fairly intuitive to grasp. What do you think?
That's an alternative I haven't considered! It amounts to thinking of sigma
in the unit of "number-of-pixels", no?
I have one example in which it is convenient to have sigma
in the units of the data. Consider classification of noisy point clouds, using ph in various dimensions. I think you'd want to choose sigma
of the order of the noise. Although, the paper for persistence Images says that the variance doesn't matter...
I have a third option, which involves the least changes. Going back to the example I described in the previous comment with sigma
huge and the range small, what you'd expect as a result of gaussian_filter
is just a constant vector. I think I can come up with a numerically-safe gaussian_filter
method, which would do just that. It amounts to choosing a threshold for sigma_pixel
, above which you approximate gaussian_filter(x, sigma)
with a constant vector. This threshold would be purely for numerical precision and, imo, should be hard coded. What would you say?
Thanks again! Some good thoughts there. I like the idea of making the least changes and your suggestion seems smart to me. Please feel free to try it!
There's still an HK test that fails (I think this is one of those flaky tests that used to fail in the past already). Could it be easy to solve with your latest technology? https://dev.azure.com/maintainers/Giotto/_build/results?buildId=3619&view=logs&j=f1a0a6a1-f9c7-5c0b-02a6-90f84662480b&t=bf059a84-475e-57cb-5aef-ee9999aa7f8f&l=478
@wreise an issue with wheels for macOS was revealed by this PR since we were not able to get to the install wheels step in recent Azure runs! I hope my latest commit in this PR fixes things.
Reference issues/PRs
Fixes the tests from #636 .
Types of changes
Description I reduced the sizes of examples generated by
hypothesis
in nerve tests, by lowering themin_side
parameter and adding amax_side
parameter. I also introduced a safeguard against gaussian smoothing withsigma_pixel
too large.Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.