KrishnaswamyLab / PHATE

PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) is a tool for visualizing high dimensional data.
http://phate.readthedocs.io
Other
470 stars 73 forks source link

s_gd2 typeerror #95

Open scottgigante opened 4 years ago

scottgigante commented 4 years ago
TypeError                                 Traceback (most recent call last)
<ipython-input-1-9418f70a3d50> in <module>
      1 import phate
----> 2 Y = phate.PHATE(knn_dist='precomputed').fit_transform(A)

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/phate.py in fit_transform(self, X, **kwargs)
    939         with _logger.task("PHATE"):
    940             self.fit(X)
--> 941             embedding = self.transform(**kwargs)
    942         return embedding
    943 

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/phate.py in transform(self, X, t_max, plot_optimal_t, ax)
    908                         n_jobs=self.n_jobs,
    909                         seed=self.random_state,
--> 910                         verbose=max(self.verbose - 1, 0),
    911                     )
    912             if isinstance(self.graph, graphtools.graphs.LandmarkGraph):

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/mds.py in embed_MDS(X, ndim, how, distance_metric, solver, n_jobs, seed, verbose)
    228         try:
    229             # use sgd2 if it is available
--> 230             Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
    231             if np.any(~np.isfinite(Y)):
    232                 _logger.warning("Using SMACOF because SGD returned NaN")

</mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/lib/python3.7/site-packages/decorator.py:decorator-gen-157> in sgd(D, n_components, random_state, init)

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/scprep/utils.py in _with_pkg(fun, pkg, min_version, *args, **kwargs)
     81         check_version(pkg, min_version=min_version)
     82         __imported_pkgs.add((pkg, min_version))
---> 83     return fun(*args, **kwargs)
     84 
     85 

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/mds.py in sgd(D, n_components, random_state, init)
     82     D = squareform(D)
     83     # Metric MDS from s_gd2
---> 84     Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
     85     return Y
     86 

/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/s_gd2/s_gd2.py in mds_direct(n, d, w, etas, num_dimensions, random_seed, init)
     82 
     83     # do mds
---> 84     cpp.mds_direct(X, d, w, etas, random_seed)
     85     return X
     86 

TypeError: Array of type 'double' required.  A 'unknown type' was given
trberg commented 3 years ago

Is there a resolution to this error? I keep running into this problem. I've been using pandas dataframes and I've tried changing data types with the same result.

Thanks!

scottgigante commented 3 years ago

Could you post the data and code you're using that produces the error? I'm having a hard time reproducing it.

In the meantime, you can avoid the error by using mds_solver='smacof'.

trberg commented 3 years ago

I can't post all the data, but I've included a small print out of the data below.

data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)

data_phate = phate_op.fit_transform(data)

Here is the error this code outputs.

           002  003  004  005  006  007  008  009  010  ...  44786754  44786774  44786872  44787062  44816559   45771331  46234829  46235085  46235338
sample_id                                               ...                                                                                           
1    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  82.975610       0.0       0.0       0.0
2    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  91.886364       0.0       0.0       0.0
3    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  85.580645       0.0       0.0       0.0
4    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  89.466667       0.0       0.0       0.0
5    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0   0.000000       0.0       0.0       0.0
...        ...  ...  ...  ...  ...  ...  ...  ...  ...  ...       ...       ...       ...       ...       ...        ...       ...       ...       ...
96   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  97.828571       0.0       0.0       0.0
97   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  97.408163       0.0       0.0       0.0
98   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  97.040816       0.0       0.0       0.0
99   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  94.113924       0.0       0.0       0.0
100  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...       0.0       0.0       0.0       0.0       0.0  88.694444       0.0       0.0       0.0

[100 rows x 3335 columns]
Calculating PHATE...
  Running PHATE on 100 observations and 3335 variables.
  Calculating graph and diffusion operator...
/data/users/trberg/anaconda3/lib/python3.7/site-packages/graphtools/graphs.py:121: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
  UserWarning,
    Calculating KNN search...
    Calculated KNN search in 0.11 seconds.
    Calculating affinities...
  Calculated graph and diffusion operator in 0.20 seconds.
  Calculating optimal t...
    Automatically selected t = 9
  Calculated optimal t in 0.04 seconds.
  Calculating diffusion potential...
  Calculating metric MDS...
  Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.26 seconds.
Traceback (most recent call last):
  File "feature_reduction.py", line 74, in <module>
    data_phate = phate_op.fit_transform(data)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 941, in fit_transform
    embedding = self.transform(**kwargs)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 910, in transform
    verbose=max(self.verbose - 1, 0),
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 230, in embed_MDS
    Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
  File "</data/users/trberg/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-146>", line 2, in sgd
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/scprep/utils.py", line 83, in _with_pkg
    return fun(*args, **kwargs)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 84, in sgd
    Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
  File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/s_gd2/s_gd2.py", line 84, in mds_direct
    cpp.mds_direct(X, d, w, etas, random_seed)
TypeError: Array of type 'double' required.  A 'unknown type' was given
scottgigante commented 3 years ago

Could you please run the following:

data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)
data.to_pickle("data.pickle.gz")

and then drag data.pickle.gz into your reply? That should be small enough to post.

trberg commented 3 years ago

The issue isn't the size of the data, it's sensitive biomedical data that I don't have permission to upload in full.

trberg commented 3 years ago

But what you're seeing in my comment above is pretty much what it looks like.

scottgigante commented 3 years ago

Unfortunately if I'm unable to view the data it's going to be difficult to diagnose. I tried to replicate data like yours and it runs fine.

>>> import numpy as np
>>> import pandas as pd
>>> import phate
>>> data = pd.DataFrame(np.random.normal(0, 1, (100, 3335)))
>>> data.index.name = "sample_id"
>>> data = data.astype(np.float64)
>>> phate_op = phate.PHATE()
>>> data_phate = phate_op.fit_transform(data)
Calculating PHATE...
  Running PHATE on 100 observations and 3335 variables.
  Calculating graph and diffusion operator...
/home/scottgigante/.local/lib/python3.8/site-packages/graphtools/graphs.py:118: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
  warnings.warn(
    Calculating KNN search...
    Calculated KNN search in 0.08 seconds.
    Calculating affinities...
    Calculated affinities in 0.01 seconds.
  Calculated graph and diffusion operator in 0.10 seconds.
  Calculating optimal t...
    Automatically selected t = 3
  Calculated optimal t in 0.02 seconds.
  Calculating diffusion potential...
  Calculating metric MDS...
  Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.14 seconds.

Some diagnostics that might help:

import phate
import s_gd2
print(phate.__version__)
print(s_gd2.__version__)

print(np.all([d == np.dtype('float64') for d in data.dtypes]))
print(data.sum(axis=0).tolist())
print(data.sum(axis=1).tolist())
print(np.all(np.isfinite(data)))
trberg commented 3 years ago

So here are some results from this code.

print(phate.__version__)         1.0.4
print(s_gd2.__version__)         1.7

print(np.all([d == np.dtype('float64') for d in data.dtypes]))      True
print(np.all(np.isfinite(data)))                                    True
print (data.values.min(), data.values.max())                        0.0     10000000.0
scottgigante commented 3 years ago

First thing I would do is upgrade both of those packages and try again. If you're still having trouble, you could send me just the PHATE kernel which wouldn't contain any identifying information from your original data:

import pickle
import gzip
with gzip.open('kernel.pickle.gz', 'wb') as f: 
    pickle.dump(phate_op.graph.kernel, f)
trberg commented 3 years ago

So the update didn't fix the issue and when I ran the zipping and pickling code, I got this error.

Traceback (most recent call last):
  File "feature_reduction.py", line 94, in <module>
    get_phate_transform(data)
  File "feature_reduction.py", line 62, in get_phate_transform
    pickle.dump(phate_op.graph.kernel, f)
AttributeError: 'NoneType' object has no attribute 'kernel'
scottgigante commented 3 years ago

Oops, sorry -- you'll need to run phate_op.fit(data) first.

trberg commented 3 years ago

Here is the kernal. kernel.pickle.gz

scottgigante commented 3 years ago

I've tested this on python 3.6 on windows subsystem for linux, python 3.7 (anaconda) on windows, and python 3.8 on arch linux. All work fine.

>>> import phate
>>> import pickle
>>> import gzip
>>> with gzip.open("kernel.pickle.gz") as f:
...     K = pickle.load(f)
>>> phate_op = phate.PHATE(knn_dist='precomputed_affinity')
>>> phate_op.fit_transform(K)

Can you check the version of the following packages? (you'll need to run in powershell and double the slashes if on windows.)

python -VV
pip freeze | grep "^\(cycler\|decorator\|Deprecated\|future\|graphtools\|joblib\|kiwisolver\|matplotlib\|numpy\|packaging\|pandas\|phate\|Pillow\|PyGSP\|pyparsing\|python\-dateutil\|pytz\|s\-gd2\|scikit\-learn\|scipy\|scprep\|six\|tasklogger\|threadpoolctl\|wrapt\)=="

My versions, for reference:

On Arch virtualenv: ``` Python 3.8.5 (default, Sep 5 2020, 10:50:12) [GCC 10.2.0] cycler==0.10.0 decorator==4.4.2 Deprecated==1.2.11 future==0.18.2 graphtools==1.5.2 joblib==1.0.0 kiwisolver==1.3.1 matplotlib==3.3.4 numpy==1.20.0 packaging==20.9 pandas==1.2.1 phate==1.0.6 Pillow==8.1.0 PyGSP==0.5.1 pyparsing==2.4.7 python-dateutil==2.8.1 pytz==2021.1 s-gd2==1.8 scikit-learn==0.24.1 scipy==1.6.0 scprep==1.0.12 six==1.15.0 tasklogger==1.0.0 threadpoolctl==2.1.0 wrapt==1.12.1 ``` On Arch: ``` Python 3.8.5 (default, Sep 5 2020, 10:50:12) [GCC 10.2.0] cycler==0.10.0 decorator==4.4.2 Deprecated==1.2.10 future==0.18.2 graphtools==1.5.2 joblib==0.16.0 kiwisolver==1.2.0 matplotlib==3.3.1 numpy==1.19.4 packaging==20.4 pandas==1.1.2 phate==1.0.4 Pillow==7.2.0 PyGSP==0.5.1 pyparsing==2.4.7 python-dateutil==2.8.1 pytz==2020.1 s-gd2==1.7 scikit-learn==0.23.2 scipy==1.5.2 six==1.15.0 tasklogger==1.0.0 threadpoolctl==2.1.0 wrapt==1.12.1 ``` On WSL: ``` Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] cycler==0.10.0 decorator==4.4.2 Deprecated==1.2.10 future==0.18.2 graphtools==1.5.2 joblib==0.16.0 kiwisolver==1.2.0 matplotlib==3.3.0 numpy==1.19.4 packaging==20.4 pandas==1.0.5 phate==1.0.4 Pillow==7.2.0 PyGSP==0.5.1 pyparsing==2.4.7 python-dateutil==2.8.1 pytz==2020.1 s-gd2==1.8 scikit-learn==0.23.1 scipy==1.5.2 scprep==1.0.10 six==1.15.0 tasklogger==1.0.0 threadpoolctl==2.1.0 wrapt==1.12.1 ``` On Windows: ``` Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] cycler==0.10.0 decorator==4.4.2 Deprecated==1.2.10 future==0.18.2 graphtools==1.5.1 joblib==0.14.1 kiwisolver==1.1.0 matplotlib==3.2.1 numpy==1.18.1 packaging==20.3 pandas==1.0.3 phate==1.0.4 Pillow==7.0.0 PyGSP==0.5.1 pyparsing==2.4.6 python-dateutil==2.8.1 pytz==2019.3 s-gd2==1.7 scikit-learn==0.22.2.post1 scipy==1.4.1 scprep==1.0.4 six==1.14.0 tasklogger==1.0.0 wrapt==1.12.1 ```