giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
848 stars 173 forks source link

huge memory needs of TakensEmbedding if n_jobs=-1. Any mitigation? #150

Closed flamby closed 2 years ago

flamby commented 4 years ago

Hello,

I realized that TakensEmbedding memory needs with more than 10k rows is quite huge (up to 96GB of RAM in my case) if n_jobs=-1 and one have lots of cores (10 cores in my case).

Is there another way to mitigate this than n_jobs=1 for instance, or plan to do so? For instance with batching with dask.

Thanks and keep the good work!

ulupo commented 4 years ago

Thanks @flamby. Could you provide us with a setup and steps to reproduce as faithully as possible? We are currently studying options for mitigating this and other performance bottlenecks for v0.1.5.

flamby commented 4 years ago

Hi @ulupo

Sure. Here it is :

import numpy as np
import giotto.time_series as ts

s=np.random.rand(39000, 1)

embedder = ts.TakensEmbedding(parameters_type='search', dimension=4,
                                  time_delay=5, n_jobs=-1)
embedder.fit(s)

On a 12 cores server, I see memory usage up to 65GB.

Using n_jobs=1 fix the issue, and I must confess that the duration is not an order of magnitude lower. n_jobs=2 requires around 16GB of RAM.

I opened the issue mainly because it took me 2 or 3 freeze of my laptop OS before figuring out it was an oom issue (macOS catalina) running on a MBP w/ 16GB RAM. So others might experienced it.

So be careful to not run that snippet above on your desktop/laptop ;-) if you don't have around 64GB of RAM.

I would suggest until it's fixed or mitigated (default to n_jobs=1) that the notebooks and documentation be changed to avoid freezing your machine learning desktop/laptop ;-)

ulupo commented 2 years ago

Closing as this has not been reported again by other users.