SydneyBioX / scMerge

Statistical approach for removing unwanted variation from multiple single-cell datasets
https://sydneybiox.github.io/scMerge/
66 stars 13 forks source link

Highly Variable Genes and clustering #3

Closed elimereu closed 5 years ago

elimereu commented 5 years ago

Hi,

thanks for this very useful new tool. I'm testing it on my data and it seems to work according to my original labels. However, it would be helpful for me now to retrieve the highly variable genes as computed by scMerge and also I would like to have/run a clustering on the normalised data.. Could you suggest any specific workflow or clustering approach I can apply by directly using your output matrix as input?

Many thanks in advance. Bests,

Elisabetta

YingxinLin commented 5 years ago

Hi Elisabetta,

thanks for testing our tool! Currently, scMerge uses function BrenneckeGetVariableGenes() from M3Drop package to select highly variable genes. To retrieve the highly variable genes information, you may need to rerun the function scReplicate() where most of the parameter settings should be the same with scMerge(), input the batch information (which does not require in scMerge()) and set return_all = TRUE. It will provide you the highly variable genes results.

We will consider to add highly variable genes results as an output for scMerge() in the future.

Most of the current clustering methods for single-cell RNA-seq data can be applied to the output of scMerge. For example, SC3 (https://bioconductor.org/packages/release/bioc/html/SC3.html); Seurat if you have a large dataset (https://satijalab.org/seurat/install.html); k-means or SIMLR (https://github.com/taiyunkim/scClustBench).

I hope this helps you!

Best wishes, Yingxin

elimereu commented 5 years ago

Hi Yingxin, thanks a lot for your reply. I was trying to apply both Seurat and SC3, but I wasn't getting the same result like in the t-SNE.. but it might be because I was wrong something. For example, to apply Seurat after scMerge, should I use as seurat.obj@data <- scmerged_out or seurat_obj@scale.data <- scmerged_out ? If I set seurat.obj@data <- scmerged_out , should I perform the scaling of the data?

Thanks in advance!

Bests, Elisabetta

Il giorno mar 30 ott 2018 alle ore 10:03 Yingxin Lin < notifications@github.com> ha scritto:

Hi Elisabetta,

thanks for testing our tool! Currently, scMerge uses function BrenneckeGetVariableGenes() from M3Drop package to select highly variable genes. To retrieve the highly variable genes information, you may need to rerun the function scReplicate() where most of the parameter settings should be the same with scMerge(), input the batch information (which does not require in scMerge()) and set return_all = TRUE. It will provide you the highly variable genes results.

We will consider to add highly variable genes results as an output for scMerge() in the future.

Most of the current clustering methods for single-cell RNA-seq data can be applied to the output of scMerge. For example, SC3 ( https://bioconductor.org/packages/release/bioc/html/SC3.html); Seurat if you have a large dataset (https://satijalab.org/seurat/install.html); k-means or SIMLR (https://github.com/taiyunkim/scClustBench).

I hope this helps you!

Best wishes, Yingxin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SydneyBioX/scMerge/issues/3#issuecomment-434222062, or mute the thread https://github.com/notifications/unsubscribe-auth/ATrry706VmPOzIb6j5XD6n03D26GOdbpks5uqBXzgaJpZM4X_gY9 .

elimereu commented 5 years ago

Also, it would be helpful to know if the normalized matrix from scMerge is log-normalized or not. In that case, is this a log10, log2 or ln transformation?

Thanks! Elisabetta

Il giorno mar 30 ott 2018 alle ore 10:03 Yingxin Lin < notifications@github.com> ha scritto:

Hi Elisabetta,

thanks for testing our tool! Currently, scMerge uses function BrenneckeGetVariableGenes() from M3Drop package to select highly variable genes. To retrieve the highly variable genes information, you may need to rerun the function scReplicate() where most of the parameter settings should be the same with scMerge(), input the batch information (which does not require in scMerge()) and set return_all = TRUE. It will provide you the highly variable genes results.

We will consider to add highly variable genes results as an output for scMerge() in the future.

Most of the current clustering methods for single-cell RNA-seq data can be applied to the output of scMerge. For example, SC3 ( https://bioconductor.org/packages/release/bioc/html/SC3.html); Seurat if you have a large dataset (https://satijalab.org/seurat/install.html); k-means or SIMLR (https://github.com/taiyunkim/scClustBench).

I hope this helps you!

Best wishes, Yingxin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SydneyBioX/scMerge/issues/3#issuecomment-434222062, or mute the thread https://github.com/notifications/unsubscribe-auth/ATrry706VmPOzIb6j5XD6n03D26GOdbpks5uqBXzgaJpZM4X_gY9 .

YingxinLin commented 5 years ago

Hi Elisabetta,

The input of scMerge expect log transformed data. However, there is a scaling step within the function that can produce negative values (very small percentage). So it's practially a "log-transformed" interpretation of data. I would suggest you use seurat.obj@data <- scmerged_out, and ScaleData if you need for dimensional reduction techniques, and is displayed in heatmaps. Please find the 7th point in https://satijalab.org/seurat/faq for more details.

Best wishes, Yingxin