epfl-lts2 / pygsp

Graph Signal Processing in Python
https://pygsp.rtfd.io
BSD 3-Clause "New" or "Revised" License
483 stars 93 forks source link

Speed and scalability improvements for graph multiresolution #3

Open EiffL opened 7 years ago

EiffL commented 7 years ago

The motivation for these modification is to improve the scalability of the code to be able to handle much larger graphs.

With the baseline code, I couldn't compute graph pyramids for graphs with about 40,000 nodes, the peak memory consumption went above 500GB ! Turns out this was due to 2 main problems:

With these modifications, I can now compute graph multiresolutions in practice for graphs with about 50,000 nodes and 1e6 edges without running into memory issues on a standard desktop. The slowest point in the process is the sampling time for the edges in graph_sparsify, scipy.stats.rv_discrete takes about 1s to sample 10,000 deviates, in my case the algorithm needed 16e6 which takes about 30min just to sample random numbers whereas the rest of the algorithm only takes minutes, so it's really stupid, but at least it eventually works.

I tried to comment and document these modifications in the hope that they might be useful to others, everything seems to work for me but someone should take a close look to check that it doesn't break anything.

rodrigo-pena commented 7 years ago

Hi, thanks for the contribution. Could you check and correct the errors that were raised in the TravisCI checks? They seem to be mostly errors in the code examples in the documentation, raising errors such as "NameError: name 'sparse' is not defined", "NameError: name 'extract_submatrix' is not defined", or "NameError: name 'block' is not defined".

rodrigo-pena commented 7 years ago

There's still a silly error in the doctest (it's one that I often forget is a problem):

pygsp/pygsp/utils.py", line 261, in utils.py Expected: (8, 8) """ M = M.tocoo() Got: (8, 8)

There should be a newline after (8,8) in the documentation

EiffL commented 7 years ago

My bad, it should be fine now.

mdeff commented 7 years ago

Thanks for the contribution. :) Could you look at @rodrigo-pena's comments so that we can merge it ?

coveralls commented 6 years ago

Coverage Status

Coverage increased (+1.7%) to 81.844% when pulling bf2183e29c17e28bbdd2595d629e3a445340649b on EiffL:faster into ef8823a410697d587227e5d2eaf376f903a577d8 on epfl-lts2:master.