LBL-EESA / fastkde

Other
52 stars 11 forks source link

RAM error for high dimensions #5

Closed Rhyst223 closed 3 years ago

Rhyst223 commented 3 years ago

Hi,

Thanks for this package it is great! However, I've been having some issues when applying to my own data. For anything more that 3 dimensions I keep getting RAM crashes (am using Google Colab) and was wondering if your package is feasible for my use case.

My goal: I have a data-frame with 16 dimensions and I want to fit a KDE which encodes the covariance between dimensions and then resample from said KDE a N number of times so I have samples with dimensions (N,16). Do you think this is possible with your package or is my number of dimensions just too large?

Reproduction with random data:

import numpy as np
from fastkde import fastKDE
from sklearn.datasets import make_spd_matrix

num_samples = 400

# The desired mean values of the sample.
mu = np.array([5.0, 0.0, 10.0,7])

# The desired covariance matrix.
r = make_spd_matrix(4, random_state=0)

# Generate the random samples.
y = np.random.multivariate_normal(mu, r, size=num_samples)

KDE = fastKDE.fastKDE(y.T)

Running the above yields crashes my colab session.

Thanks!

taobrienlbl commented 3 years ago

Hi @Rhyst223 , glad to hear that you found fastKDE!

Unfortunately this is unavoidable due to one of the two 'curses of dimensionality' associated with this method (see Section 5 of O'Brien et al. (2016) for a discussion of this). The memory requirement is exponential with the number of variables: a 16-variable KDE would need something on the order of 100^16 bytes (1e32 bytes) of memory, which is more memory than would be available if all RAM chips in existence were able to be used.

O’Brien, T. A., K. Kashinath, N. R. Cavanaugh, W. D. Collins, and J. P. O’Brien, 2016: A fast and objective multidimensional kernel density estimation method: FastKDE. Comput. Stat. Data Anal., 101, 148–160, https://doi.org/10.1016/j.csda.2016.02.014.

You may need to consider using parametric methods to encode relationships among the variables. Unfortunately, fastKDE won't work in this case.

Best of luck!