danilkolikov / fsfc

Feature Selection for Clustering
MIT License
91 stars 28 forks source link

Trying to extract relevant features after clustering #5

Open ShankaranarayananBR opened 3 years ago

ShankaranarayananBR commented 3 years ago

Hi, I am using this library as part of my thesis project, to extract relevant features for my Multitask learning model, and to prevent negative gradient flow.

I have followed the steps as mentioned in the github page. Attaching the code, below

from fsfc.generic import NormalizedCut from sklearn.pipeline import Pipeline from sklearn.cluster import KMeans

X = dt.to_numpy() pipeline = Pipeline([ ('select', NormalizedCut(3)), ('cluster', KMeans()) ]) pipeline.fit_predict(X)

Attaching the error below

MemoryError Traceback (most recent call last)

in 3 ('cluster', KMeans()) 4 ]) ----> 5 pipeline.fit_predict(X) c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\metaestimators.py in (*args, **kwargs) 118 119 # lambda, but not partial, allows help() to work with update_wrapper --> 120 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs) 121 # update the docstring of the returned function 122 update_wrapper(out, self.fn) c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py in fit_predict(self, X, y, **fit_params) 447 """ 448 fit_params_steps = self._check_fit_params(**fit_params) --> 449 Xt = self._fit(X, y, **fit_params_steps) 450 451 fit_params_last_step = fit_params_steps[self.steps[-1][0]] c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params_steps) 305 message_clsname='Pipeline', 306 message=self._log_message(step_idx), --> 307 **fit_params_steps[name]) 308 # Replace the transformer of the step with the fitted 309 # transformer. This is necessary when loading the transformer c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\joblib\memory.py in __call__(self, *args, **kwargs) 350 351 def __call__(self, *args, **kwargs): --> 352 return self.func(*args, **kwargs) 353 354 def call_and_shelve(self, *args, **kwargs): c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params) 752 with _print_elapsed_time(message_clsname, message): 753 if hasattr(transformer, 'fit_transform'): --> 754 res = transformer.fit_transform(X, y, **fit_params) 755 else: 756 res = transformer.fit(X, y, **fit_params).transform(X) c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params) 697 if y is None: 698 # fit method of arity 1 (unsupervised transformation) --> 699 return self.fit(X, **fit_params).transform(X) 700 else: 701 # fit method of arity 2 (supervised transformation) ~\Desktop\Master Thesis\DataSet\fsfc\base.py in fit(self, x, *rest) 70 71 def fit(self, x, *rest): ---> 72 self.scores = self._calc_scores(x) 73 return self 74 ~\Desktop\Master Thesis\DataSet\fsfc\generic\SPEC.py in _calc_scores(self, x) 42 43 def _calc_scores(self, x): ---> 44 similarity = rbf_kernel(x) 45 adjacency = similarity 46 degree_vector = np.sum(adjacency, 1) c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\pairwise.py in rbf_kernel(X, Y, gamma) 1103 gamma = 1.0 / X.shape[1] 1104 -> 1105 K = euclidean_distances(X, Y, squared=True) 1106 K *= -gamma 1107 np.exp(K, K) # exponentiate K in-place c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0 c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\pairwise.py in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared) 311 else: 312 # if dtype is already float64, no need to chunk and upcast --> 313 distances = - 2 * safe_sparse_dot(X, Y.T, dense_output=True) 314 distances += XX 315 distances += YY c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0: ---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0 c:\users\s.bangaloreramalinga\appdata\local\programs\python\python37\lib\site-packages\sklearn\utils\extmath.py in safe_sparse_dot(a, b, dense_output) 150 ret = np.dot(a, b) 151 else: --> 152 ret = a @ b 153 154 if (sparse.issparse(a) and sparse.issparse(b) MemoryError: Unable to allocate 1.44 TiB for an array with shape (444234, 444234) and data type float64