aleixalcacer / archetypes

Scikit-learn compatible package for archetypal analysis.
https://archetypes.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
19 stars 6 forks source link

aa.fit(array): Input contains NaN, infinity or a value too large for dtype('float64') #9

Closed josefheidler closed 2 years ago

josefheidler commented 2 years ago

Hello, I'm trying to do some analysis, but got this error on aa.fit(array). If I check mine numpy array, every value is float64. There is zero NaN values aswell. So I'm not sure, where is the problem. Can someone help me out?

Thanks!

array: scaled_df.csv

ValueError                                Traceback (most recent call last)
/home/josef/projects/clustering/archetypes.ipynb Cell 7' in <cell line: [9](vscode-notebook-cell://wsl%2Bubuntu/home/josef/projects/clustering/archetypes.ipynb#ch0000006vscode-remote?line=8)>()
      9 for k in reps:
     [10](vscode-notebook-cell://wsl%2Bubuntu/home/josef/projects/clustering/archetypes.ipynb#ch0000006vscode-remote?line=9)     aa = arch.AA(n_archetypes=k, **aa_kwargs)
---> [11](vscode-notebook-cell://wsl%2Bubuntu/home/josef/projects/clustering/archetypes.ipynb#ch0000006vscode-remote?line=10)     aa.fit(scaled)
     [13](vscode-notebook-cell://wsl%2Bubuntu/home/josef/projects/clustering/archetypes.ipynb#ch0000006vscode-remote?line=12)     rss.append({"n_archetypes": k, "rss": aa.rss_})

File ~/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py:147, in AA.fit(self, X, y, **fit_params)
    [127](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=126) def fit(self, X, y=None, **fit_params):
    [128](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=127)     """
    [129](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=128)     Compute Archetype Analysis.
    [130](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=129) 
   (...)
    [145](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=144)         Fitted estimator.
    [146](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=145)     """
--> [147](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=146)     X = self._validate_data(X, dtype=[np.float64, np.float32])
    [148](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=147)     self._check_parameters()
    [149](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/archetypes/algorithms/archetypes.py?line=148)     self._check_data(X)

File ~/projects/clustering/venv/lib/python3.8/site-packages/sklearn/base.py:566, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    [564](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/base.py?line=563)     raise ValueError("Validation should be done on X, y or both.")
    [565](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/base.py?line=564) elif not no_val_X and no_val_y:
--> [566](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/base.py?line=565)     X = check_array(X, **check_params)
    [567](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/base.py?line=566)     out = X
    [568](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/base.py?line=567) elif no_val_X and not no_val_y:

File ~/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py:800, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    [794](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=793)         raise ValueError(
    [795](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=794)             "Found array with dim %d. %s expected <= 2."
    [796](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=795)             % (array.ndim, estimator_name)
    [797](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=796)         )
    [799](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=798)     if force_all_finite:
--> [800](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=799)         _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    [802](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=801) if ensure_min_samples > 0:
    [803](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=802)     n_samples = _num_samples(array)

File ~/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py:114, in _assert_all_finite(X, allow_nan, msg_dtype)
    [107](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=106)     if (
    [108](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=107)         allow_nan
    [109](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=108)         and np.isinf(X).any()
    [110](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=109)         or not allow_nan
    [111](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=110)         and not np.isfinite(X).all()
    [112](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=111)     ):
    [113](file:///home/josef/projects/clustering/venv/lib/python3.8/site-p
ackages/sklearn/utils/validation.py?line=112)         type_err = "infinity" if allow_nan else "NaN, infinity"
--> [114](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=113)         raise ValueError(
    [115](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=114)             msg_err.format(
    [116](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=115)                 type_err, msg_dtype if msg_dtype is not None else X.dtype
    [117](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=116)             )
    [118](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=117)         )
    [119](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=118) # for object dtype data, we only check for NaNs (GH-13254)
    [120](file:///home/josef/projects/clustering/venv/lib/python3.8/site-packages/sklearn/utils/validation.py?line=119) elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
aleixalcacer commented 2 years ago

Hello,

I've run this code using the dataset you attached:

>>> import pandas as pd
>>> df = pd.read_csv("scaled_df.csv")
>>> df.isnull().values.sum()
4

Regarding the output, it seems that your array has 4 values that are NaN. Can you attach your code?

If you want more information about missing data, you can read this: https://pandas.pydata.org/docs/user_guide/missing_data.html

josefheidler commented 2 years ago

Nevermind, thank you! I found it!

Sorry!