Open Wulin-Tan opened 2 years ago
@Wulin-Tan, it looks like to_array()
was removed from Pandas (and subsequently from cuDF) in a recent version. I have a fix for this coming in soon but for now you should be able to replace all occurrences of to_array()
in rapids_scanpy_funcs.py
with to_numpy()
.
This PR here should fix the problem: https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-examples/pull/98.
@cjnolet Hi, thank you for the update. It really helps. But when I tried the new rapids_scanpy_funcs.py and moved on, I got stuck in the "rapids_scanpy_funcs.rank_genes_groups". The dataset is the demo one: krasnow_hlca_10x.sparse.h5ad. The error is as follows:
RuntimeError Traceback (most recent call last)
File
File ~/autodl-tmp/rapids_scanpy_funcs.py:430, in rank_genes_groups(X, labels, var_names, groups, reference, n_genes, kwds) 427 y = labels.loc[grouping] 429 clf = LogisticRegression(kwds) --> 430 clf.fit(X.get(), grouping.astype('float32').to_numpy()) 431 scoresall = cp.array(clf.coef).T 433 for igroup, group in enumerate(groups_order):
File ~/miniconda3/envs/rapids-22.06/lib/python3.8/site-packages/cuml/internals/api_decorators.py:409, in BaseReturnAnyDecorator.call.
File cuml/linear_model/logistic_regression.pyx:276, in cuml.linear_model.logistic_regression.LogisticRegression.fit()
File ~/miniconda3/envs/rapids-22.06/lib/python3.8/site-packages/cupy/_manipulation/add_remove.py:179, in unique(ar, return_index, return_inverse, returncounts, axis) 177 aux = ar[perm] 178 else: --> 179 ar.sort() 180 aux = ar 181 mask = cupy.empty(aux.shape, dtype=cupy.bool)
File cupy/_core/core.pyx:729, in cupy._core.core.ndarray.sort()
File cupy/_core/core.pyx:747, in cupy._core.core.ndarray.sort()
File cupy/_core/_routines_sorting.pyx:43, in cupy._core._routines_sorting._ndarray_sort()
File cupy/cuda/thrust.pyx:75, in cupy.cuda.thrust.sort()
RuntimeError: radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument
can you take a look? thank you.
@Wulin-Tan the changes in my PR also update rapids to 22.08. Did you also update your environment?
@cjnolet Hi, I followed your suggestion and updated both the rapids-22.08 env and rapids_scanpy_funcs.py. Now it works perfectly well. Thank you very much.
Hi, when I followd the jupyter notebook and tried use rapids_scanpy_funcs.highly_variable_genes, I came across the error:
AttributeError Traceback (most recent call last) File:1, in
File ~/autodl-tmp/rapids_scanpy_funcs.py:753, in highly_variable_genes(sparse_gpu_array, genes, n_top_genes) 751 mean = sparse_gpu_array.sum(axis=0).flatten() / n_cells 752 mean_sq = sparse_gpu_array.multiply(sparse_gpu_array).sum(axis=0).flatten() / n_cells --> 753 variable_genes = _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes) 755 return variable_genes
File ~/autodl-tmp/rapids_scanpy_funcs.py:702, in _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes) 700 df = pd.DataFrame() 701 # Note - can be replaced with cudf once 'cut' is added in 21.08 --> 702 df['genes'] = genes.to_array() 703 df['means'] = mean.tolist() 704 df['dispersions'] = dispersion.tolist()
AttributeError: 'Series' object has no attribute 'to_array'
And all the commands before this one could run well. Any suggestion? Thank you.