giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
858 stars 175 forks source link

Refactor GraphGeodesicDistance #422

Closed ulupo closed 4 years ago

ulupo commented 4 years ago

Types of changes

Description There are a couple of major problems with the current implementation of GraphGeodesicDistance:

  1. The return type of transform is always ndarray, even when different distance matrix shapes imply that the ndarray is 1D. But in this case, output fed to the homology transformers will fail due to the current implementation of check_point_cloud, see https://github.com/giotto-ai/giotto-tda/blob/188b6755b7f567a49d0a15cb63492de29a81ab45/gtda/utils/validation.py#L260
  2. The handling of zero entries, infinity entries and non-stored entries in the sparse case was quite opaque and some non-generic shortcuts were taken in the code to address most, but not all, cases.

This PR fixes both problems and introduces additional changes. In particular:

Checklist

CLAassistant commented 4 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: ulupo
:x: Umberto


Umberto seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

ulupo commented 4 years ago

@wreise as per our discussion, there is a SciPy inconsistency which I have now signalled in https://github.com/scipy/scipy/issues/12424. It is basically impossible to have the wanted results using the Floyd-Warshall algorithm (option 'FW', which could also be selected when method='auto') when some edges have zero weight. In 50b6c1b, I introduced a check for this which overrides the user selection if necessary and warns the user of the situation.

Notice that the test ground truths were incorrect (!): if one node has zero distance from every other node, then all nodes have zero distance from all other nodes. I have fixed this.

ulupo commented 4 years ago

@wreise I've implemented essentially all the tests in the above gist as unit tests. All algorithms are also tested to give the same results.