standardized data after batch effects removal by Harmony

Starlitnightly / omicverse

A python library for multi omics included bulk, single cell and spatial RNA-seq analysis.

https://starlitnightly.github.io/omicverse/

GNU General Public License v3.0

431 stars 46 forks source link

standardized data after batch effects removal by Harmony #22

Closed faker1c closed 5 months ago

faker1c commented 1 year ago

How to obtain standardized data after batch effects removal by Harmony for differential analysis between two groups of cells？

Starlitnightly commented 1 year ago

Hi,

Unlike Bulk transcriptomes, scRNA-seq removal of batch effects does not change data values, but rather for better clustering

faker1c commented 1 year ago

Hi,I still have some questions when using it. For data from different articles, is it possible to use the dds.deg_analysis function to calculate differential expression without removing batch effects? Can we simply use the standardized adata.X? The sc.tl.rank_genes_groups function documentation does not mention how to use the use_rep parameter. When calculating marker genes, does adding use_rep='X_harmony' mean that adata.obsm["X_harmony"] is used for the calculation instead of adata.X?

Starlitnightly commented 1 year ago

Hi,I still have some questions when using it. For data from different articles, is it possible to use the dds.deg_analysis function to calculate differential expression without removing batch effects? Can we simply use the standardized adata.X? The sc.tl.rank_genes_groups function documentation does not mention how to use the use_rep parameter. When calculating marker genes, does adding use_rep='X_harmony' mean that adata.obsm["X_harmony"] is used for the calculation instead of adata.X?

I don't think dds.deg_analysis can work better for single-cell data because single-cell data is too sparse, moreover, X_harmony can be used as a feature vector for cells, but its not representative of genes

faker1c commented 1 year ago

Thanks！ Does this mean that for single-cell sequencing data from different sources, after integration and processing with sc.pp.normalize_total and sc.pp.log1p, it is possible to directly perform differential expression, GO, and KEGG analysis, without the need for batch correction like in bulk RNA-seq?