Open omarOMF opened 2 months ago
Hey Omar,
If you check adata.uns['leiden_05_markers'], one or more of the cluster will have no marker genes, which I think is causing this issue.
Perhaps try recalling the markers, using:
cc.tl.get_markers(adata, 'leiden_05', min_markers=1)
Hopefully there is this option, otherwise I may need to update the pypi package!
Thank you for your response.
I tried cc.tl.get_markers(adata, 'leiden_05', min_de=1
) and still running into the same error.
I didn't find min_markers
in the function unless I am using a different version. This is what get
` cc.tl.get_markers?
Signature:
cc.tl.get_markers(
data: anndata._core.anndata.AnnData,
groupby: str,
var_groups: str = None,
logfc_cutoff: float = 0,
padj_cutoff: float = 0.05,
t_cutoff: float = 3,
n_top: int = 5,
rerun_de: bool = True,
gene_order=None,
pts: bool = False,
min_de: int = 0,
verbose: bool = True,
)
Docstring:
Gets marker genes per cluster.
Parameters
----------
data: sc.AnnData
Single cell RNA-seq anndata, QC'd a preprocessed to log-cpm in
data.X.
groupby: str
Specifies the clusters to perform one-versus-rest Welch's t-test
comparison of genes for.
Must specify defined column in data.obs[groupby].
Must be categorical type.
var_groups: str
Specifies a column in data.var of type boolean, with True indicating
the candidate genes to use when determining marker genes per cluster.
Useful to, for example, remove ribosomal and mitochondrial genes.
None indicates use all genes in data.var_names as candidates.
logfc_cutoff: float
Minimum logfc for a gene to be a considered a marker gene for a
given cluster.
marker_padj_cutoff: float
Adjusted p-value (Benjamini-Hochberg correction) below which a gene
can be considered a marker gene.
t_cutoff: float
The minimum t-value a gene must have to be considered a marker gene
(Welch's t-statistic with one-versus-rest comparison).
n_top: int
The maximimum no. of marker genes per cluster.
rerun_de: bool
Whether to rerun the DE analysis, or using existing results in
data.uns['rank_genes_groups']. Useful if have ran get_markers()
with the same 'groupby' as input, but want to adjust the other
parameters to determine marker genes.
gene_order: str
By default, gets n_top qualifying genes ranked by t-value.
Specifying logfc here will rank by log-FC, instead.
pts: bool
Whether to calculate percentage cells expressing gene within/without
of each cluster. Only relevant if rerun_de=True.
min_de: int
Minimum number of genes to use as markers, if not criteria met.
verbose: bool
Print statements during computation (True) or silent run (False).
Returns
--------
data.uns[f'{groupby}_markers']
Dictionary with cluster names as keys, and list of marker
genes as values.
File: ~/miniconda3/envs/scv/lib/python3.9/site-packages/cytocipher/score_and_merge/cluster_score.py
Type: function
Also I can see there are markers selected per cluster.
caps.uns['leiden_05_markers']
{'0': array(['MT2A', 'TMSB10', 'CLDN5', 'NEAT1', 'IFITM3', 'MT1E'], dtype=object),
'1': array(['AC083867.2', 'AC138647.2', 'AL359380.1', 'KCND3-AS1',
'AC012555.2', 'GNG14'], dtype=object),
'2': array(['GPCPD1', 'SPOCK3', 'SLCO1A2', 'SLC39A10', 'ABCG2', 'CA4'],
dtype=object),
'3': array(['OMD', 'TMEM45B', 'ATP10A', 'SLC39A10', 'ABCB1', 'THSD4'],
dtype=object),
'4': array(['ANO2', 'GALNT18', 'VWF', 'SCARB1', 'TNS1', 'HIF1A-AS3'],
dtype=object),
'5': array(['FAM43A', 'AC005083.1', 'AL928596.1', 'AC091078.2', 'BX664730.1',
'AC083867.2'], dtype=object)}
Hmm cluster markers dict looks correct, and I am looking at the source code and cannot see how the code would be throwing that error, given that there are genes in that dictionary.
Looks like the anndata object is called 'caps' in the example above? Only way I can see it would throw that error is if the marker dictionary was empty when running cc.tl.code_enrich(adata, 'leiden_05')
Are you running:
cc.tl.code_enrich(caps, 'leiden_05')
If you still get the same error, could you try running this snippet and let me know what it looks like?
cluster_genes_dict = caps.uns['leiden_05_markers']
# Putting all genes into array for speed.
all_genes = []
[all_genes.extend(cluster_genes_dict[cluster])
for cluster in cluster_genes_dict]
# Getting correct typing
str_dtype = f"<U{max([len(gene_name) for gene_name in all_genes])}"
all_genes = np.unique( all_genes ).astype(str_dtype)
str_dtype_clust = f"<U{max([len(clust) for clust in cluster_genes_dict])}"
In particular, could you send the result of all_genes
?
Hi, Thank you for developing this helpful package! I am currently trying to optimize the number of clusters for a cell population using your package, but I've encountered an error that I'm having trouble interpreting. Here's the sequence of functions I ran:
However, I encounter the following error when I attempt to merge clusters. Could you please help me understand what might be going wrong? Here's the error message I received:
cc.tl.merge_clusters(adata, 'leiden_05')