Closed ulupo closed 4 years ago
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.
:white_check_mark: ulupo
:x: Umberto
@wreise as per our discussion, there is a SciPy inconsistency which I have now signalled in https://github.com/scipy/scipy/issues/12424. It is basically impossible to have the wanted results using the Floyd-Warshall algorithm (option 'FW'
, which could also be selected when method='auto'
) when some edges have zero weight. In 50b6c1b, I introduced a check for this which overrides the user selection if necessary and warns the user of the situation.
Notice that the test ground truths were incorrect (!): if one node has zero distance from every other node, then all nodes have zero distance from all other nodes. I have fixed this.
@wreise I've implemented essentially all the tests in the above gist as unit tests. All algorithms are also tested to give the same results.
Types of changes
Description There are a couple of major problems with the current implementation of
GraphGeodesicDistance
:transform
is always ndarray, even when different distance matrix shapes imply that the ndarray is 1D. But in this case, output fed to the homology transformers will fail due to the current implementation ofcheck_point_cloud
, see https://github.com/giotto-ai/giotto-tda/blob/188b6755b7f567a49d0a15cb63492de29a81ab45/gtda/utils/validation.py#L260This PR fixes both problems and introduces additional changes. In particular:
transform
is only ndarray if it can be turned into a 3D ndarray, else it is list.shortest_path
replaces scikit-learn'sgraph_shortest_path
. This is because it supports a wider range of algorithms, it supports masked arrays, it is better maintained, and it has "better" behaviour in the sparse case (see below). I have created a gist to exhibit the difference in behaviour and the new behaviour ofGraphGeodesicDistance
: https://gist.github.com/ulupo/83cc82ce83379ebda8fdfe846d0c06a5.numpy.inf
;False
edges in Boolean arrays do denote absent edges;directed
,unweighted
andmethod
are made available, with the obvious meanings.Checklist
flake8
to check my Python changes.pytest
to check this on Python tests.