Closed flamby closed 2 years ago
Thanks @flamby. Could you provide us with a setup and steps to reproduce as faithully as possible? We are currently studying options for mitigating this and other performance bottlenecks for v0.1.5.
Hi @ulupo
Sure. Here it is :
import numpy as np
import giotto.time_series as ts
s=np.random.rand(39000, 1)
embedder = ts.TakensEmbedding(parameters_type='search', dimension=4,
time_delay=5, n_jobs=-1)
embedder.fit(s)
On a 12 cores server, I see memory usage up to 65GB.
Using n_jobs=1
fix the issue, and I must confess that the duration is not an order of magnitude lower.
n_jobs=2
requires around 16GB of RAM.
I opened the issue mainly because it took me 2 or 3 freeze of my laptop OS before figuring out it was an oom issue (macOS catalina) running on a MBP w/ 16GB RAM. So others might experienced it.
So be careful to not run that snippet above on your desktop/laptop ;-) if you don't have around 64GB of RAM.
I would suggest until it's fixed or mitigated (default to n_jobs=1
) that the notebooks and documentation be changed to avoid freezing your machine learning desktop/laptop ;-)
Closing as this has not been reported again by other users.
Hello,
I realized that TakensEmbedding memory needs with more than 10k rows is quite huge (up to 96GB of RAM in my case) if
n_jobs=-1
and one have lots of cores (10 cores in my case).Is there another way to mitigate this than
n_jobs=1
for instance, or plan to do so? For instance with batching with dask.Thanks and keep the good work!