aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
420 stars 179 forks source link

Pandas "non-numeric data" error during enrichment analysis #95

Closed RoganGrant closed 4 years ago

RoganGrant commented 5 years ago

I have been working through the pySCENIC tutorial locally with an in-house dataset. All seems well, thus far. When I get to the AUCell / Phase III step, however, I get the following error trace:

ValueError                                Traceback (most recent call last)
<ipython-input-41-1057b4293bfc> in <module>
----> 1 auc_mtx = aucell(ex_matrix, regulons, num_workers=8)

~/anaconda3/envs/SCENIC/lib/python3.7/site-packages/pyscenic/aucell.py in aucell(exp_mtx, signatures, auc_threshold, noweights, normalize, num_workers)
    154     :return: A dataframe with the AUCs (n_cells x n_modules).
    155     """
--> 156     return aucell4r(create_rankings(exp_mtx), signatures, auc_threshold, noweights, normalize, num_workers)
    157 

~/anaconda3/envs/SCENIC/lib/python3.7/site-packages/pyscenic/aucell.py in create_rankings(ex_mtx)
     45     #    df.sample(frac=1.0, replace=False).rank(ascending=False, method='first', na_option='bottom').sort_index() - 1
     46     #
---> 47     return ex_mtx.sample(frac=1.0, replace=False, axis=1).rank(axis=1, ascending=False, method='first', na_option='bottom').astype(DTYPE) - 1
     48 
     49 

~/anaconda3/envs/SCENIC/lib/python3.7/site-packages/pandas/core/generic.py in rank(self, axis, method, numeric_only, na_option, ascending, pct)
   8686         if numeric_only is None:
   8687             try:
-> 8688                 return ranker(self)
   8689             except TypeError:
   8690                 numeric_only = True

~/anaconda3/envs/SCENIC/lib/python3.7/site-packages/pandas/core/generic.py in ranker(data)
   8677                 ascending=ascending,
   8678                 na_option=na_option,
-> 8679                 pct=pct,
   8680             )
   8681             ranks = self._constructor(ranks, **data._construct_axes_dict())

~/anaconda3/envs/SCENIC/lib/python3.7/site-packages/pandas/core/algorithms.py in rank(values, axis, method, na_option, ascending, pct)
    926             ascending=ascending,
    927             na_option=na_option,
--> 928             pct=pct,
    929         )
    930     else:

pandas/_libs/algos_rank_helper.pxi in pandas._libs.algos.rank_2d_object()

ValueError: first not supported for non-numeric data

The only thing I can think is that my row indices are cell names (e.g. SC35___CGCGTTTGTCCTCTTG), however resetting indices and removing the resultant "cell" column does not solve the issue. Perhaps I'm missing something very simple, but I can't imagine why index sorting would still fail.

TIA, Rogan

RoganGrant commented 5 years ago

Temporary solution: auc_mtx = aucell(ex_matrix.astype(float), regulons)

cflerin commented 4 years ago

Hi,

It seems like your expression matrix is non-numeric, is this possible? Since your temporary solution was to convert to float, this is all I can think of right now.

RoganGrant commented 4 years ago

The matrix is definitely fully numeric, but it's not treated as such. Looking more closely, it seems that this may actually be an issue with feather. I will go ahead and close.