BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
286 stars 109 forks source link

Missing design matrix in TCGAbatch_Correction for unpublished data #538

Open mona-n opened 1 year ago

mona-n commented 1 year ago

Hello,

I am using TCGAbiolinks' function TCGAbatch_Correction to perform batch correction on TARGET data. For this, I am using the "unpublished" mode of this function. I have noticed that for this mode, the batch correction - done using ComBat - does not include a model matrix as the mod argument in the ComBat function. The line of code that performs the batch correction using ComBat currently looks like this:

if (UnpublishedData == TRUE) {
   batch.factor <- as.factor(AnnotationDF$Batch)
   batch_corr <- sva::ComBat(dat = tabDF, batch = batch.factor,
                             par.prior = TRUE, prior.plots = TRUE)
 }

However, I believe it should include a design matrix like this:

if (UnpublishedData == TRUE) {
   batch.factor <- as.factor(AnnotationDF$Batch)
   batch_corr <- sva::ComBat(dat = tabDF, batch = batch.factor, mod = design.matrix,
                             par.prior = TRUE, prior.plots = TRUE)
 }

where the design matrix includes for example the condition of the samples. This can be created like so: design_matrix <- model.matrix(~group)

Let me know what you think Thank you, Mona