giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
848 stars 173 forks source link

Memory error running 3 dimensions persistence computation #167

Closed holmbuar closed 2 years ago

holmbuar commented 4 years ago

Description

Memory error when running 3 homology dimensions persistence calculation. 2 first dimensions work ok.

Steps/Code to Reproduce

Load a dataset of 1920 9-dimensional points, compute 3 first components of PCA. Run a persistence computation for dimensions on (0, 1, 2) on a Jupyter notebook copied from the classifying shapes notebook.

Expected Results

Persistence diagrams for 3 first dimensions.

Actual Results

https://gist.github.com/torlarse/1e65988b08f685f09ece38b11e8fa496

Versions

Windows-10-10.0.18362-SP0 Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] NumPy 1.17.4 SciPy 1.3.2 joblib 0.14.1 Scikit-Learn 0.22.1 giotto-Learn 0.1.3

lewtun commented 4 years ago

thanks @torlarse! i'll see if i can reproduce the error and get back to you soon

holmbuar commented 4 years ago

The memory error seem to be persistent in the latest release (pun). This time it fails for samples with 1000 points e.g

def tori_sampler(r_i, r_o, num_samples):
    """
    arguments:
    r_i: inner radius of torus
    r_o: outer radius of torus
    num_samples: number of sampling points
    """
    # drawing angles
    theta = (2  * np.pi - 0) *   np.random.random((num_samples, 1))
    ksi = (2  * np.pi - 0) *   np.random.random((num_samples, 1))
    # computing
    x = np.array((r_o + r_i * np.cos(theta)) * np.cos(ksi)) 
    y = np.array((r_o + r_i * np.cos(theta)) * np.sin(ksi)) 
    z = np.array(r_i * np.sin(theta))
    torus_points = np.column_stack((x, y, z))
    return torus_points

torus = tori_sampler(1.0, 2.0, 1000)
lewtun commented 4 years ago

thanks for the reminder @torlarse and sorry i have not addressed this issue yet ;(

we will try to have a look at this asap

MonkeyBreaker commented 4 years ago

I took a look on the matter,

I wasn't able to reproduce the problem, I tried with the csv file and the data generated with tori_sampler. What I could observe on my resources is that it needed 8GB of RAM for complete the task. I don't know if this comes from the C++ or the Python side.

In my opinion there's an out of memory error behind the problem, maybe catch exception on Python or add checks in C++, I don't know. But this issue should be explore in more details to have better insights.

At the moment, it's more related on the ressources available on the computer that an issue on the library. But maybe memory could be better managed ?

Julián

ulupo commented 2 years ago

Closing as we should be in a much better position re memory consumption with giotto-ph.