ConesaLab / MultiBaC

Single and multiomic batch effect correction
5 stars 2 forks source link

Clarifying questions on published materials #2

Open maxnest opened 1 year ago

maxnest commented 1 year ago

Hello! I am very interested in the approach developed by your team, but after reading the available materials, several questions remain: 1) The results of correction of yeast data were given as an example, but do I understand correctly that there are no restrictions on working with data from multicellular organisms? 2) Is MultiBaC applicable for correcting the batch effect in one data set (i.e., in one matrix)? 3) Given the presented example, in the case of analysis of RNA-seq data, tables with normalized expression values should be submitted to the input (yeast data were after TMM normalization). Is it possible to use raw data (to further carry out analysis with edgeR or DESeq2)? What about Transcripts-Per-Million (TPM) values? Is there any additional data preparation required in each of these cases? Thanks!

AnaConesa commented 1 year ago

Hi Thanks for your interest in Multibac. Please see my responses below

  1. The results of correction of yeast data were given as an example, but do I understand correctly that there are no restrictions on working with data from multicellular organisms?

Correct. You can use MultiBac with any organism

  1. Is MultiBaC applicable for correcting the batch effect in one data set (i.e., in one matrix)?

MultiBaC has been designed to correct the batch effect across omics types to favor multiomics data integration. You are expected to integrate several omics matrices, not just one. The most simple scenario is to have three different omics types distributed in two batches, with one of the omics types shared between the two batches.

  1. Given the presented example, in the case of analysis of RNA-seq data, tables with normalized expression values should be submitted to the input (yeast data were after TMM normalization). Is it possible to use raw data (to further carry out analysis with edgeR or DESeq2)? What about Transcripts-Per-Million (TPM) values? Is there any additional data preparation required in each of these cases?

Normalized data are requested because the batch correction process will return corrected values that are no longer counts. Using TPM values is fine. If you see large distribution differences among samples within the same batch, then TMM is also recommended, but if this is not a problem, just TPM would be fine.

Hope this helps

Ana

maxnest commented 1 year ago

@AnaConesa, thank you for your detailed and quick response! Do I understand correctly that, for example, if biological replicates are collected at different times in one study, we cannot use ARSynNbac (In my previous post I was referring to this module) to correct for the batch effect solely on one data set? The question is caused by the fact that I have already tried different approaches, and, first of all, ComBat-seq, which, although it removes some of the artificial differences, behaves extremely strangely on big data.

AnaConesa commented 1 year ago

Hi

In this case you should use the ARSyN function of the MultiBac package. You can use or not the information about the batches. Also it works better when you have a multifactorial design.

maxnest commented 1 year ago

@AnaConesa, thank you! One more question, initially, the ARSyN method was intended for the analysis of microarrays, in MultiBac package the method has been adapted to analyze the results of RNA-seq data?

AnaConesa commented 1 year ago

Yes, provided that you can normalized data, not count data.

maxnest commented 1 year ago

Thank you!