epfl-lts2 / pygsp

Graph Signal Processing in Python
https://pygsp.rtfd.io
BSD 3-Clause "New" or "Revised" License
488 stars 93 forks source link

Implement partial fourier basis (fixes #27) #28

Closed scottgigante closed 6 years ago

scottgigante commented 6 years ago

Implements partial computation of the fourier basis for plotting of the Laplacian Eigenmap and/or spectral clustering without the need for full eigendecomposition. API discussed in #27.

nperraud commented 6 years ago

Thanks a lot for the PR. I left two small comments. I think you did a good job, but I guess @mdeff should be the one to do the merging.

nperraud commented 6 years ago

The test are failing with errors like this:

File "pygsp/utils.py", line 255, in utils.py
Failed example:
    utils.symmetrize(W, method='tril')
Expected:
    array([[0., 3., 4.],
           [3., 1., 2.],
           [4., 2., 3.]])
Got:
    array([[ 0.,  3.,  4.],
           [ 3.,  1.,  2.],
           [ 4.,  2.,  3.]])

Any idea what is going on?

scottgigante commented 6 years ago

Both valid comments. I've given it a slightly different warning message depending on whether the user asks for full eigendecomposition or partial with many eigenvectors, but if you have a preferred message I can concatenate them.

The tests thing is weird - I know doctest does require string equivalence of test results, but I didn't change utils and I think default behaviour shouldn't have changed. Looks like a numpy or Python version change, to me.

scottgigante commented 6 years ago

Just an update on those failing tests: one of them was actually my test, hidden among the others. Should be fixed now. The others are currently also failing on master: https://travis-ci.org/epfl-lts2/pygsp/jobs/382958006

scottgigante commented 6 years ago

Actually it still doesn't pass:

>>> np.testing.assert_allclose(np.abs(U), np.abs(G.U[:,-n:]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scottgigante/.local/lib/python2.7/site-packages/numpy/testing/utils.py", line 1395, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/home/scottgigante/.local/lib/python2.7/site-packages/numpy/testing/utils.py", line 778, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 0.0722061242071%)
 x: array([[  2.714909e-02,   1.355861e-02,   5.760985e-02, ...,
          2.945387e-08,   1.336955e-08,   2.885479e-08],
       [  3.213260e-02,   1.889188e-02,   5.686249e-02, ...,...
 y: array([[  2.714909e-02,   1.355861e-02,   5.760985e-02, ...,
          2.945387e-08,   1.336955e-08,   2.885479e-08],
       [  3.213260e-02,   1.889188e-02,   5.686249e-02, ...,...
>>> np.testing.assert_allclose(np.abs(U), np.abs(G.U[:,-n:]), atol=1e-13)
>>> np.testing.assert_allclose(np.abs(U), np.abs(G.U[:,-n:]), rtol=1e-2)

Either of setting absolute tolerance or relative tolerance a little higher works. Absolute tolerance makes more sense to me but I'm open to suggestions here.

mdeff commented 6 years ago

I've fixed the issues (361f67e, 57f370a) in master. You can rebase for the tests to pass.

scottgigante commented 6 years ago

Thanks for the review @mdeff ! I've implemented those changes and rebased from master so the tests should pass, I think. Re your comment: "give a potential solution: do a partial decomposition, or use polynomial filters if the goal is filtering. Maybe mention the filtering of random signals as well." - I'm not quite sure what I would say about filtering random signals. If you have better suggested wording than the current warning I'm happy to put it in.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.1%) to 82.129% when pulling 908848f4c3d54d57a889964847b7f76d06056a29 on scottgigante:partial_eigendecomposition into e696d9fbe2744c8e6c8ffcdd692db411cf34ed7c on epfl-lts2:master.

scottgigante commented 6 years ago

For reference, the decrease in coverage is due to the slow eigendecomposition warning not being tested. We could add a simple large problem (eigendecomposition of 3000x3000 identity?) if this is necessary, but I don't think we need to test that warning.

scottgigante commented 6 years ago

Am I right in saying that partial eigendecomposition wouldn't give the smallest eigenvector? Then those checks will fail if n_eigenvectors < N

mdeff commented 6 years ago

We want to compute the eigenvectors associated with the smallest eigenvalues. For example for spectral clustering, we often use the first k. For spectral embedding, we use 2 or 3. So you need to pass which='SM' (smallest magnitude) to eigsh. Moreover, in set_coordinates, we have to discard the constant eigenvector (index 0) and use the first 2-3.

scottgigante commented 6 years ago

Ah, good point. Should we allow the user to pass which with 'SM' as default?

nperraud commented 6 years ago

My feeling is that for the coordinates, the user always wants the smallest ones. So I am not sure that an option makes sense here.

scottgigante commented 6 years ago

For coordinates, I agree - but for compute_fourier_basis(n_eigenvectors=k) in general, would the user ever want the eigenvectors associated with the largest eigenvectors?

mdeff commented 6 years ago

I don't see any use for the largest ones.

scottgigante commented 6 years ago

Done. One test is failing on Python2.7 but I don't think it's my fault.

mdeff commented 6 years ago

Not your fault, that was a transient fail. Looks good to me now. Thanks for the PR, and don't hesitate to come back for another one! :)

scottgigante commented 6 years ago

Thanks!