Open maximelepetit opened 3 months ago
Hi, because the STOmics team has been developing and improving cellbin analysis, and the cellbin results depend on the quality of the data and image, so only one standard result is provided in the previous pipeline. And, in our latest release of SAW v8, the cellCluster program could already support the input of resolution parameter. 1) About the parameters you asked, they are listed below. Of course, we recommend that you could refer to the parameters provided by Stereopy’s documents.
filter_cells: min_gene=1 other defaults
highly_variable_genes: min_mean=0.0125, max_mean=3, min_disp=0.5, n_top_genes=3000
pca: n_pcs=20
neighbors: n_pcs=30
2) We're sorry that the code file in .pyc format means that our source code is confidential and hope for your understanding. About the cell_cluster, it is the part of the cellbin tutorial part in the Stereopy's documents up to clustering, and output the result file by Stereopy's functions, st.io.update_gef
and st.io.stereo_to_anndata
.
In addition, because the version of Stereopy and random seeds setting, the clustering results may have very slight differences.
Hi, Thanks for the reply.
Regarding you're answer , i have some questions :
In filter_cells: min_gene=1 other defaults
, what do you mean by "other defaults"?
StPipeline.filter_cells(min_counts=None, max_counts=None, min_genes=None, max_genes=None, pct_counts_mt=None, cell_list=None, filter_raw=True, excluded=False, inplace=True, **kwargs)
min_counts=200, max_genes=2500, pct_counts_mt=5
as described in the cellbin tutorial ?In pca: n_pcs=20 neighbors: n_pcs=30
isn't it rather : pca: n_pcs=20 neighbors: n_pcs=20
because if you run PCA with 20 PC you only have access to 20 PC for neighbors ?
Maxime
Hi, thanks for your correction. For your questions, 1) Defaults mean the values set in the Stereopy API 2) We're sorry, this is a bug in one of previous versions, but due to the characteristics of numpy arrays, selecting 30 pcs has the same result as selecting 20 pcs for neighbors
Thanks !! Last question regarding the gene filtration : Do you filter gene based on the numbers of cells or counts ? If yes what parameter values are used?
Thanks !! Last question regarding the gene filtration : Do you filter gene based on the numbers of cells or counts ? If yes what parameter values are used?
Do you mean the function StPipeline.filter_genes? No, we don't run this step in SAW pipeline.
Thanks, following you're suggestions , I can't get the same UMAP.
Here the UMAP on the SAW report :
Here the code used and below the UMAP that I obtained :
data_path = './041.cellcut/A02989D6.adjusted.cellbin.gef'
data = st.io.read_gef(file_path=data_path, bin_type='cell_bins')
data.tl.filter_cells(min_genes=1, inplace=True)
data.tl.raw_checkpoint()
data.tl.normalize_total()
data.tl.log1p()
data.tl.highly_variable_genes( min_mean=0.0125,max_mean=3,min_disp=0.5, n_top_genes=3000,res_key='highly_variable_genes')
data.tl.scale(max_value=10)
data.tl.pca(use_highly_genes=True,res_key='pca',n_pcs=20)
data.tl.neighbors( pca_res_key='pca', n_pcs=30, res_key='neighbors')
data.tl.umap(pca_res_key='pca',neighbors_res_key='neighbors', res_key='umap')
I missed something ? My stereopy version is 1.3.1
I missed something ? My stereopy version is 1.3.1
I think the version of SAW you use is 7.1? If so, the version of Stereopy in SAW v7.1.2 is 0.14.0b1 (for SAW v7.0, it's 0.12.1). The version update of stereopy involves the update of umap functions, such as addition of thread and seed setting in st.tl.umap, of which the default method have been changed to single thread with the sacrifice of computational efficiency to ensure reproducibility of results.
Yes i used SAW version 7.1 I'll update it later! Thanks
Hello,
1) I'd like to have a little more clarity on the QCs and parameters used to generate the cellbin UMAP on the SAW report.
I know stereopy is used, but I would like to know the values of the parameters (min/max count, max percent.mt, min/max feature) for the QC, I would like to know the number of principal components used for the neighborhood graph, and the embedding of the UMAP. I'd also like to know the resolution used for clustering.
On the report, for squarebin=200 a resolution of 1 is used, but for the cellCluster part the resolution parameter is not used as input to the function. Why not ? Illustration here :
cellCluster :
2) Another remarks/question, inside the singularity image, I only have access to the compiled binary file cell_cluster.pyc, How can I access the cell_cluster.py file?
Thanks in advance !
Bests,
Maxime