NVIDIA-Genomics-Research / rapids-single-cell-examples

Examples of single-cell genomic analysis accelerated with RAPIDS
Apache License 2.0
326 stars 69 forks source link

error in rapids_scanpy_funcs.highly_variable_genes #97

Open Wulin-Tan opened 2 years ago

Wulin-Tan commented 2 years ago

Hi, when I followd the jupyter notebook and tried use rapids_scanpy_funcs.highly_variable_genes, I came across the error:


AttributeError Traceback (most recent call last) File :1, in

File ~/autodl-tmp/rapids_scanpy_funcs.py:753, in highly_variable_genes(sparse_gpu_array, genes, n_top_genes) 751 mean = sparse_gpu_array.sum(axis=0).flatten() / n_cells 752 mean_sq = sparse_gpu_array.multiply(sparse_gpu_array).sum(axis=0).flatten() / n_cells --> 753 variable_genes = _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes) 755 return variable_genes

File ~/autodl-tmp/rapids_scanpy_funcs.py:702, in _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes) 700 df = pd.DataFrame() 701 # Note - can be replaced with cudf once 'cut' is added in 21.08 --> 702 df['genes'] = genes.to_array() 703 df['means'] = mean.tolist() 704 df['dispersions'] = dispersion.tolist()

AttributeError: 'Series' object has no attribute 'to_array'

And all the commands before this one could run well. Any suggestion? Thank you.

cjnolet commented 2 years ago

@Wulin-Tan, it looks like to_array() was removed from Pandas (and subsequently from cuDF) in a recent version. I have a fix for this coming in soon but for now you should be able to replace all occurrences of to_array() in rapids_scanpy_funcs.py with to_numpy().

cjnolet commented 2 years ago

This PR here should fix the problem: https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-examples/pull/98.

Wulin-Tan commented 2 years ago

@cjnolet Hi, thank you for the update. It really helps. But when I tried the new rapids_scanpy_funcs.py and moved on, I got stuck in the "rapids_scanpy_funcs.rank_genes_groups". The dataset is the demo one: krasnow_hlca_10x.sparse.h5ad. The error is as follows:


RuntimeError Traceback (most recent call last) File :1, in

File ~/autodl-tmp/rapids_scanpy_funcs.py:430, in rank_genes_groups(X, labels, var_names, groups, reference, n_genes, kwds) 427 y = labels.loc[grouping] 429 clf = LogisticRegression(kwds) --> 430 clf.fit(X.get(), grouping.astype('float32').to_numpy()) 431 scoresall = cp.array(clf.coef).T 433 for igroup, group in enumerate(groups_order):

File ~/miniconda3/envs/rapids-22.06/lib/python3.8/site-packages/cuml/internals/api_decorators.py:409, in BaseReturnAnyDecorator.call..inner_with_setters(*args, kwargs) 402 self_val, input_val, target_val = \ 403 self.get_arg_values(*args, *kwargs) 405 self.do_setters(self_val=self_val, 406 input_val=input_val, 407 target_val=target_val) --> 409 return func(args, kwargs)

File cuml/linear_model/logistic_regression.pyx:276, in cuml.linear_model.logistic_regression.LogisticRegression.fit()

File ~/miniconda3/envs/rapids-22.06/lib/python3.8/site-packages/cupy/_manipulation/add_remove.py:179, in unique(ar, return_index, return_inverse, returncounts, axis) 177 aux = ar[perm] 178 else: --> 179 ar.sort() 180 aux = ar 181 mask = cupy.empty(aux.shape, dtype=cupy.bool)

File cupy/_core/core.pyx:729, in cupy._core.core.ndarray.sort()

File cupy/_core/core.pyx:747, in cupy._core.core.ndarray.sort()

File cupy/_core/_routines_sorting.pyx:43, in cupy._core._routines_sorting._ndarray_sort()

File cupy/cuda/thrust.pyx:75, in cupy.cuda.thrust.sort()

RuntimeError: radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument


can you take a look? thank you.

cjnolet commented 2 years ago

@Wulin-Tan the changes in my PR also update rapids to 22.08. Did you also update your environment?

Wulin-Tan commented 2 years ago

@cjnolet Hi, I followed your suggestion and updated both the rapids-22.08 env and rapids_scanpy_funcs.py. Now it works perfectly well. Thank you very much.