Open scottgigante opened 4 years ago
Is there a resolution to this error? I keep running into this problem. I've been using pandas dataframes and I've tried changing data types with the same result.
Thanks!
Could you post the data and code you're using that produces the error? I'm having a hard time reproducing it.
In the meantime, you can avoid the error by using mds_solver='smacof'
.
I can't post all the data, but I've included a small print out of the data below.
data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)
data_phate = phate_op.fit_transform(data)
Here is the error this code outputs.
002 003 004 005 006 007 008 009 010 ... 44786754 44786774 44786872 44787062 44816559 45771331 46234829 46235085 46235338
sample_id ...
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 82.975610 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 91.886364 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 85.580645 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 89.466667 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
96 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 97.828571 0.0 0.0 0.0
97 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 97.408163 0.0 0.0 0.0
98 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 97.040816 0.0 0.0 0.0
99 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 94.113924 0.0 0.0 0.0
100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 88.694444 0.0 0.0 0.0
[100 rows x 3335 columns]
Calculating PHATE...
Running PHATE on 100 observations and 3335 variables.
Calculating graph and diffusion operator...
/data/users/trberg/anaconda3/lib/python3.7/site-packages/graphtools/graphs.py:121: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
UserWarning,
Calculating KNN search...
Calculated KNN search in 0.11 seconds.
Calculating affinities...
Calculated graph and diffusion operator in 0.20 seconds.
Calculating optimal t...
Automatically selected t = 9
Calculated optimal t in 0.04 seconds.
Calculating diffusion potential...
Calculating metric MDS...
Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.26 seconds.
Traceback (most recent call last):
File "feature_reduction.py", line 74, in <module>
data_phate = phate_op.fit_transform(data)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 941, in fit_transform
embedding = self.transform(**kwargs)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 910, in transform
verbose=max(self.verbose - 1, 0),
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 230, in embed_MDS
Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
File "</data/users/trberg/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-146>", line 2, in sgd
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/scprep/utils.py", line 83, in _with_pkg
return fun(*args, **kwargs)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 84, in sgd
Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/s_gd2/s_gd2.py", line 84, in mds_direct
cpp.mds_direct(X, d, w, etas, random_seed)
TypeError: Array of type 'double' required. A 'unknown type' was given
Could you please run the following:
data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)
data.to_pickle("data.pickle.gz")
and then drag data.pickle.gz
into your reply? That should be small enough to post.
The issue isn't the size of the data, it's sensitive biomedical data that I don't have permission to upload in full.
But what you're seeing in my comment above is pretty much what it looks like.
Unfortunately if I'm unable to view the data it's going to be difficult to diagnose. I tried to replicate data like yours and it runs fine.
>>> import numpy as np
>>> import pandas as pd
>>> import phate
>>> data = pd.DataFrame(np.random.normal(0, 1, (100, 3335)))
>>> data.index.name = "sample_id"
>>> data = data.astype(np.float64)
>>> phate_op = phate.PHATE()
>>> data_phate = phate_op.fit_transform(data)
Calculating PHATE...
Running PHATE on 100 observations and 3335 variables.
Calculating graph and diffusion operator...
/home/scottgigante/.local/lib/python3.8/site-packages/graphtools/graphs.py:118: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
warnings.warn(
Calculating KNN search...
Calculated KNN search in 0.08 seconds.
Calculating affinities...
Calculated affinities in 0.01 seconds.
Calculated graph and diffusion operator in 0.10 seconds.
Calculating optimal t...
Automatically selected t = 3
Calculated optimal t in 0.02 seconds.
Calculating diffusion potential...
Calculating metric MDS...
Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.14 seconds.
Some diagnostics that might help:
import phate
import s_gd2
print(phate.__version__)
print(s_gd2.__version__)
print(np.all([d == np.dtype('float64') for d in data.dtypes]))
print(data.sum(axis=0).tolist())
print(data.sum(axis=1).tolist())
print(np.all(np.isfinite(data)))
So here are some results from this code.
print(phate.__version__) 1.0.4
print(s_gd2.__version__) 1.7
print(np.all([d == np.dtype('float64') for d in data.dtypes])) True
print(np.all(np.isfinite(data))) True
print (data.values.min(), data.values.max()) 0.0 10000000.0
First thing I would do is upgrade both of those packages and try again. If you're still having trouble, you could send me just the PHATE kernel which wouldn't contain any identifying information from your original data:
import pickle
import gzip
with gzip.open('kernel.pickle.gz', 'wb') as f:
pickle.dump(phate_op.graph.kernel, f)
So the update didn't fix the issue and when I ran the zipping and pickling code, I got this error.
Traceback (most recent call last):
File "feature_reduction.py", line 94, in <module>
get_phate_transform(data)
File "feature_reduction.py", line 62, in get_phate_transform
pickle.dump(phate_op.graph.kernel, f)
AttributeError: 'NoneType' object has no attribute 'kernel'
Oops, sorry -- you'll need to run phate_op.fit(data)
first.
Here is the kernal. kernel.pickle.gz
I've tested this on python 3.6 on windows subsystem for linux, python 3.7 (anaconda) on windows, and python 3.8 on arch linux. All work fine.
>>> import phate
>>> import pickle
>>> import gzip
>>> with gzip.open("kernel.pickle.gz") as f:
... K = pickle.load(f)
>>> phate_op = phate.PHATE(knn_dist='precomputed_affinity')
>>> phate_op.fit_transform(K)
Can you check the version of the following packages? (you'll need to run in powershell and double the slashes if on windows.)
python -VV
pip freeze | grep "^\(cycler\|decorator\|Deprecated\|future\|graphtools\|joblib\|kiwisolver\|matplotlib\|numpy\|packaging\|pandas\|phate\|Pillow\|PyGSP\|pyparsing\|python\-dateutil\|pytz\|s\-gd2\|scikit\-learn\|scipy\|scprep\|six\|tasklogger\|threadpoolctl\|wrapt\)=="
My versions, for reference: