Closed malonzm1 closed 10 months ago
@malonzm1, can you show the result of adata.X.data.max()
?
New version (1.6.1) should have fixed this.
Thanks!
It still says the following: WARNING:celltypist.logger:⚠️ Warning: invalid expression matrix, expect all genes and log1p normalized expression to 10000 counts per cell. The prediction result may not be accurate
@malonzm1, can you show the shape of the data (adata.shape
), and the result of adata.X.expm1().sum(axis=1).min()
and adata.X.expm1().sum(axis=1).max()
adata.shape (3535249, 19494) adata.X.expm1().sum(axis=1).min() 9999.994 adata.X.expm1().sum(axis=1).max() 10000.007
@malonzm1, that's weird. Did you slice the data (genes) before prediction? Could you put all code here reproducing the warning message above?
The warning message is: WARNING:celltypist.logger:⚠️ Warning: invalid expression matrix, expect all genes and log1p normalized expression to 10000 counts per cell. The prediction result may not be accurate
The code is:
import scanpy as sc
import pandas as pd
import scvi
from glob import glob
import os
import celltypist
from celltypist import models
infolder = '/scratch/cs/pan-autoimmune/data/scvi/10x'
os.chdir(infolder)
adata = sc.read_h5ad(filename='%s/10x.h5ad'%infolder)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars = ['mt'], percent_top=None, log1p=False, inplace=True)
adata = adata[adata.obs.pct_counts_mt < 15]
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.filter_genes(adata, min_cells = 3)
sc.pp.filter_cells(adata, min_genes = 200)
sc.pp.filter_cells(adata, min_counts = 200)
sc.pp.normalize_total(adata, target_sum=1e4)
adata.layers["counts"] = adata.X.copy()
sc.pp.log1p(adata)
adata.raw = adata
sc.pp.highly_variable_genes(
adata,
n_top_genes=1200,
subset=True,
layer="counts",
flavor="seurat_v3",
batch_key="gse",
)
scvi.model.SCVI.setup_anndata(
adata,
layer="counts",
categorical_covariate_keys=["gse"],
continuous_covariate_keys=['pct_counts_mt', 'total_counts']
#continuous_covariate_keys=["percent_mito", "percent_ribo"],
)
models.download_models(force_update = True)
predictions = celltypist.annotate(adata, model = 'Immune_All_High.pkl', majority_voting = True)
adata = predictions.to_adata()
@malonzm1, you specified subset=True
in sc.pp.highly_variable_genes
, which means only a subset of genes (here 1200) can be found in adata.X
. That's why a warning is raised because CellTypist expect all genes (for maximalising the overlap between the model and the query data) rather than only a few genes.
Btw, I think you need to put adata.layers["counts"] = adata.X.copy()
before sc.pp.normalize_total(adata, target_sum=1e4)
.
Will close this issue. Please re-open it if you have further questions.
Hi,
I tried using celltypist with the following code:
But it returned the following error:
raise ValueError( ValueError: � Invalid expression matrix in both
.X
and.raw.X
, expect log1p normalized expression to 10000 counts per cellI tried with a smaller dataset and it worked but not with the bigger dataset. Please advise.
Thanks and good day.