STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
197 stars 65 forks source link

Can batches integration be performed in cellbin? #143

Closed mcraftt closed 1 year ago

mcraftt commented 1 year ago

When i perform integration in "cellbin" type, the umap result are like three different elements that seems no integration process has been executed.

here's the code:

%%

import stereo as st import warnings warnings.filterwarnings('ignore')

%%

data_path_1 = './1.gef' st.io.read_gef_info(data_path_1) data1 = st.io.read_gef(file_path=data_path_1,bin_type="cell_bins") data_path_2 = './2.gef' st.io.read_gef_info(data_path_2) data2 = st.io.read_gef(file_path=data_path_2,bin_type="cell_bins") data_path_3 = './3.gef' st.io.read_gef_info(data_path_3) data3 = st.io.read_gef(file_path=data_path_3,bin_type="cell_bins")

%%

data1.tl.cal_qc() data2.tl.cal_qc() data3.tl.cal_qc() data1.plt.genes_count() data2.plt.genes_count() data3.plt.genes_count()

%%

data1.tl.filter_cells(max_n_genes_by_counts=450, pct_counts_mt=10, inplace=True) data2.tl.filter_cells(max_n_genes_by_counts=450, pct_counts_mt=7, inplace=True) data3.tl.filter_cells(max_n_genes_by_counts=550, pct_counts_mt=7, inplace=True)

%%

data = st.utils.data_helper.merge(data1,data2,data3) data.shape data.tl.normalize_total() data.tl.log1p()

%%

data.tl.pca(use_highly_genes=False, n_pcs=50, res_key='pca') data.tl.batches_integrate(pca_res_key='pca', res_key='pca_integrated')

%%

data.tl.neighbors(pca_res_key='pca_integrated', n_pcs=50, res_key='neighbors_integrated') data.tl.umap(pca_res_key='pca_integrated', neighbors_res_key='neighbors_integrated', res_key='umap_integrated') data.plt.batches_umap(res_key='umap_integrated')

image The image result has been uploaded.

ChiragNepal commented 1 year ago

I integrated using tissue.gem file, it works

data_path_M = 'B01809B1.tissue.gem.gz' data_path_F = '/B01809B2.tissue.gem.gz' data1 = st.io.read_gem(data_path_M) data2 = st.io.read_gem(data_path_F)

Then rest of the code, you could follow as yours. It should work.

BTW, have you found a way to save the merged data object as dataframe with cluster info.

tanliwei-coder commented 1 year ago

@mcraftt I recommend you upgrade stereopy to 0.13.0b1 to run the BatchQC to evaluate whether you need to run batches_integrate, the conclusion is in the summary which is at the end of the BatchQC report.

mcraftt commented 1 year ago

@mcraftt I recommend you upgrade stereopy to 0.13.0b1 to run the BatchQC to evaluate whether you need to run batches_integrate, the conclusion is in the summary which is at the end of the BatchQC report.

I will. But the difference between using cellbin and squarebin doing integration is too obvious. Here's the umap result while using squarebin to integrate. I wonder if this method works well on cellbin. 1689350792146

mcraftt commented 1 year ago

I integrated using tissue.gem file, it works

data_path_M = 'B01809B1.tissue.gem.gz' data_path_F = '/B01809B2.tissue.gem.gz' data1 = st.io.read_gem(data_path_M) data2 = st.io.read_gem(data_path_F)

Then rest of the code, you could follow as yours. It should work.

BTW, have you found a way to save the merged data object as dataframe with cluster info.

The merged data will be an annotated data, you can convert it to h5ad files with "leiden" info (if this is what you mean).

tanliwei-coder commented 1 year ago

@mcraftt I recommend you upgrade stereopy to 0.13.0b1 to run the BatchQC to evaluate whether you need to run batches_integrate, the conclusion is in the summary which is at the end of the BatchQC report.

I will. But the difference between using cellbin and squarebin doing integration is too obvious. Here's the umap result while using squarebin to integrate. I wonder if this method works well on cellbin. 1689350792146

@mcraftt Generally, the result of batches_integrate is related to the data itself but not the data format type(bin or cell_bin) because there is no diference between on the algorithmic side, I personally think it is possible to get an unexpected result, maybe because of the data is not the same batch or the data there is no need to inegrate at all or other reasons which I don't know, the batches_integrate function is based on harmony algrithm, you can learn more on this paper and this is its git repo.