Improve ripser bindings performance

reds-heig commented 4 years ago

Reference issues/PRs

None

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Description

Benchmarking showed that with big datasets (e.g. Torus 50000 point-cloud), the copy performed when calling the bindings becomes costly in terms of performance (runtime and memory).

To prevent this, the bindings where updated to accept by reference numpy matrices/arrays.

Some clean-up and improvements were done inside ripser_interface.

Screenshots (if appropriate)

Any other comments?

Checklist

[x] I have read the guidelines for contributing.
[x] My code follows the code style of this project. I used flake8 to check my Python changes.
[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[ ] I have added tests to cover my changes.
[x] All new and existing tests passed. I used pytest to check this on Python tests.

CLAassistant commented 4 years ago

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

MonkeyBreaker commented 4 years ago

About this PR, as observed in the benchmarking, we have too much memory used in python when the code calls the ripser implementation.

Let me give more details, if you measure the memory consumed only by ripser for computing the torus dataset. I obtain around 8 GB of RAM used in maximum. But if I do the same with our ripser_interface, without the collapser I run out of memory with a machine having 32GB of RAM. With the Collapser I have around 20GB used in maximum during the computation.

After some investigation I have done, this is because some data is kept alive:

X : input data
dm : which is also keep alive in the context but we don't require it for the computation inside of ripser
dperm2all : Should really this value be returned from python ripser (we had a discussion on this with @ulupo)

We need to think of a way to better manage this, because it can become a limitation when working on "big" datasets. Right now out of my head, I think about the following solutions:

Before calling the C++ function, make sure that all the cited variables previously are set to None in order to all the garbage collector in python to free the memory
Perform all this computation directly in the C++ code.
Doing the first proposition but only if an input parameter of the function like clean_memory=True is set.

I don't think there's a "best" way of achieving this, but let discuss this.

@ulupo, do you think it could be good idea to integrate to this PR a first version of an heuristic to automatically determine into using or not the collapser ?

ulupo commented 4 years ago

After some investigation I have done, this is because some data is kept alive:

X : input data

dm : which is also keep alive in the context but we don't require it for the computation inside of ripser

dperm2all : Should really this value be returned from python ripser (we had a discussion on this with @ulupo)

I think this issue only affects the case in which metric != 'precomputed'. If metric == 'precomputed' and n_perm=0 (default) then both dm and dperm2all are simply references to X (no new data created -- but please check if I'm wrong here), and we can never delete X from memory as it is user input which he/she might need to use for other tasks.

So the question is what to do with dm and dperm2all when metric != 'precomputed'. Starting with dm: we probably don't have to worry in the sparse case (if the input is sparse, it is probably light-weight also). So we only need to worry about the dense case. We could run del dm immediately before the call to DRFDM in line 300. About dperm2all, I'm happy for it to be None when n_perm = 0 for now (so you can just change line 240 to dperm2all = None).

@ulupo, do you think it could be good idea to integrate to this PR a first version of an heuristic to automatically determine into using or not the collapser ?

Sure, if you have one, why not. But this PR needs to be merged quite soon (at the very latest end of next week).

MonkeyBreaker commented 4 years ago

I think this issue only affects the case in which metric != 'precomputed'. If metric == 'precomputed' and n_perm=0 (default) then both dm and dperm2all are simply references to X (no new data created -- but please check if I'm wrong here), and we can never delete X from memory as it is user input which he/she might need to use for other tasks.

You're absolutely right !

So the question is what to do with dm and dperm2all when metric != 'precomputed'. Starting with dm: we probably don't have to worry in the sparse case (if the input is sparse, it is probably light-weight also). So we only need to worry about the dense case. We could run del dm immediately before the call to DRFDM in line 300. About dperm2all, I'm happy for it to be None when n_perm = 0 for now (so you can just change line 240 to dperm2all = None).

I'll run some test to see if it does free memory.

Sure, if you have one, why not. But this PR needs to be merged quite soon (at the very latest end of next week).

I'll try to make a detailed benchmarking with the dataset o3_1024 (1024 points):

from 10 to 500 by increment of 10
From dimension 2 to dimension 5

What I would like is to see if we can observe something like a threshold like dim == 2 && nb_points > 300 then use Collapser. Does it make sense to you or would you have a different approach ?

MonkeyBreaker commented 4 years ago

Well, right now I'm really too busy with other tasks unfortunately.

IMO, we could already merge this PR, and hopefully in a future not too far, I would add the missing parts:

Manage better the memory in the python part in some cases
Heuristic for automatically choose when to use the collapser

Julián

ulupo commented 4 years ago

@MonkeyBreaker, I understand.

Concerning

Manage better the memory in the python part in some cases

If you like, I can merge this PR and then quickly open another with the changes I suggested in my previous messages. They should be easy to implement pretty much exactly as I said then.

MonkeyBreaker commented 4 years ago

Sure it's a good idea !

giotto-ai / giotto-tda

Improve ripser bindings performance #501