Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
260 stars 40 forks source link

predictions = celltypist.annotate "ValueError:" #44

Closed DRSEI closed 1 year ago

DRSEI commented 1 year ago

Hi Teichlab,

This is a really excellent tool and I love to use it. I am able to manage to run the tutorial but when I replace it with my own dataset I am strat to get errors .

adata_2000.X.expm1().sum(axis = 1)

matrix([[1.39864217e+141],
        [4.99632738e+074],
        [1.12685234e+037],
        ...,
        [4.65627696e+256],
        [1.14145687e+070],
        [3.34092341e+191]])

adata_2000_raw = adata_2000.copy()
sc.pp.normalize_total(adata_2000_raw)
sc.pp.log1p(adata_2000_raw)
adata_2000.raw = adata_2000_raw

# Not run; predict cell identities using this loaded model.
predictions = celltypist.annotate(adata_2000, model = model, majority_voting = True)
# Alternatively, just specify the model name (recommended as this ensures the model is intact every time it is loaded).
#predictions = celltypist.annotate(adata_2000, model = 'Immune_All_High.pkl', majority_voting = True)

👀 Invalid expression matrix in `.X`, expect log1p normalized expression to 10000 counts per cell; will try the `.raw` attribute
⚠️ Warning: invalid expression matrix, expect all genes and log1p normalized expression to 10000 counts per cell. The prediction result may not be accurate
🔬 Input data has 122530 cells and 24910 genes
🔗 Matching reference genes in the model
🧬 5900 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?f160eb2e-abec-4641-aa29-501be9d31a2d)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [35], line 2
      1 # Not run; predict cell identities using this loaded model.
----> 2 predictions = celltypist.annotate(adata_2000, model = model, majority_voting = True)

File ~/opt/miniconda3/envs/SCVI/lib/python3.8/site-packages/celltypist/annotate.py:89, in annotate(filename, model, transpose_input, gene_file, cell_file, mode, p_thres, majority_voting, over_clustering, min_prop)
     87 #over clustering
     88 if over_clustering is None:
---> 89     over_clustering = clf.over_cluster()
     90     predictions.adata = clf.adata
     91 elif isinstance(over_clustering, str):

File ~/opt/miniconda3/envs/SCVI/lib/python3.8/site-packages/celltypist/classifier.py:418, in Classifier.over_cluster(self, resolution)
    416     logger.info("👀 Can not detect a neighborhood graph, will construct one before the over-clustering")
    417     adata = self.adata.copy()
--> 418     self.adata.obsm['X_pca'], self.adata.obsp['connectivities'], self.adata.obsp['distances'], self.adata.uns['neighbors'] = Classifier._construct_neighbor_graph(adata)
    419 else:
    420     logger.info("👀 Detected a neighborhood graph in the input object, will run over-clustering on the basis of it")

File ~/opt/miniconda3/envs/SCVI/lib/python3.8/site-packages/celltypist/classifier.py:393, in Classifier._construct_neighbor_graph(adata)
    391 if 'highly_variable' not in adata.var:
    392     sc.pp.filter_genes(adata, min_cells=5)
--> 393     sc.pp.highly_variable_genes(adata, n_top_genes = min([2500, adata.n_vars]))
    394 adata = adata[:, adata.var.highly_variable]
...
    265     )
    266 elif mn == mx:  # adjust end points before binning
    267     mn -= 0.001 * abs(mn) if mn != 0 else 0.001

ValueError: cannot specify integer `bins` when input data contains infinity

Thank you

ChuanXu1 commented 1 year ago

@DRSEI, is your adata.X a raw count matrix? If yes, you can do sc.pp.normalize_total(adata, target_sum=1e4) and sc.pp.log1p(data) before prediction. Let me know if this works.

DRSEI commented 1 year ago

Hi @ChuanXu1 ,

I did call the adata.X but it seems like it did not sture it but now I managed to run the script. Are you guys planning to put more mouse prediction dataset?