giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
858 stars 175 forks source link

Improve ripser bindings performance #501

Closed reds-heig closed 4 years ago

reds-heig commented 4 years ago

Reference issues/PRs

None

Types of changes

Description

Benchmarking showed that with big datasets (e.g. Torus 50000 point-cloud), the copy performed when calling the bindings becomes costly in terms of performance (runtime and memory).

To prevent this, the bindings where updated to accept by reference numpy matrices/arrays.

Some clean-up and improvements were done inside ripser_interface.

Screenshots (if appropriate)

Any other comments?

Checklist

CLAassistant commented 4 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

MonkeyBreaker commented 4 years ago

About this PR, as observed in the benchmarking, we have too much memory used in python when the code calls the ripser implementation.

Let me give more details, if you measure the memory consumed only by ripser for computing the torus dataset. I obtain around 8 GB of RAM used in maximum. But if I do the same with our ripser_interface, without the collapser I run out of memory with a machine having 32GB of RAM. With the Collapser I have around 20GB used in maximum during the computation.

After some investigation I have done, this is because some data is kept alive:

We need to think of a way to better manage this, because it can become a limitation when working on "big" datasets. Right now out of my head, I think about the following solutions:

I don't think there's a "best" way of achieving this, but let discuss this.

@ulupo, do you think it could be good idea to integrate to this PR a first version of an heuristic to automatically determine into using or not the collapser ?

ulupo commented 4 years ago

After some investigation I have done, this is because some data is kept alive:

  • X : input data
  • dm : which is also keep alive in the context but we don't require it for the computation inside of ripser
  • dperm2all : Should really this value be returned from python ripser (we had a discussion on this with @ulupo)

I think this issue only affects the case in which metric != 'precomputed'. If metric == 'precomputed' and n_perm=0 (default) then both dm and dperm2all are simply references to X (no new data created -- but please check if I'm wrong here), and we can never delete X from memory as it is user input which he/she might need to use for other tasks.

So the question is what to do with dm and dperm2all when metric != 'precomputed'. Starting with dm: we probably don't have to worry in the sparse case (if the input is sparse, it is probably light-weight also). So we only need to worry about the dense case. We could run del dm immediately before the call to DRFDM in line 300. About dperm2all, I'm happy for it to be None when n_perm = 0 for now (so you can just change line 240 to dperm2all = None).

@ulupo, do you think it could be good idea to integrate to this PR a first version of an heuristic to automatically determine into using or not the collapser ?

Sure, if you have one, why not. But this PR needs to be merged quite soon (at the very latest end of next week).

MonkeyBreaker commented 4 years ago

I think this issue only affects the case in which metric != 'precomputed'. If metric == 'precomputed' and n_perm=0 (default) then both dm and dperm2all are simply references to X (no new data created -- but please check if I'm wrong here), and we can never delete X from memory as it is user input which he/she might need to use for other tasks.

You're absolutely right !

So the question is what to do with dm and dperm2all when metric != 'precomputed'. Starting with dm: we probably don't have to worry in the sparse case (if the input is sparse, it is probably light-weight also). So we only need to worry about the dense case. We could run del dm immediately before the call to DRFDM in line 300. About dperm2all, I'm happy for it to be None when n_perm = 0 for now (so you can just change line 240 to dperm2all = None).

I'll run some test to see if it does free memory.

Sure, if you have one, why not. But this PR needs to be merged quite soon (at the very latest end of next week).

I'll try to make a detailed benchmarking with the dataset o3_1024 (1024 points):

What I would like is to see if we can observe something like a threshold like dim == 2 && nb_points > 300 then use Collapser. Does it make sense to you or would you have a different approach ?

MonkeyBreaker commented 4 years ago

Well, right now I'm really too busy with other tasks unfortunately.

IMO, we could already merge this PR, and hopefully in a future not too far, I would add the missing parts:

Julián

ulupo commented 4 years ago

@MonkeyBreaker, I understand.

Concerning

Manage better the memory in the python part in some cases

If you like, I can merge this PR and then quickly open another with the changes I suggested in my previous messages. They should be easy to implement pretty much exactly as I said then.

MonkeyBreaker commented 4 years ago

Sure it's a good idea !