MNStokholm / GEDI

An R package to enable transcriptomic data integration.
3 stars 1 forks source link

There is still batch effect remaining post batch correction #8

Open byqmed opened 6 months ago

byqmed commented 6 months ago

Hello, I followed all your steps for the batch correction with the following codes:

dataFolders <- c("RNAseq", "Affymetrix", "Agilent") sources <- c("RNAseq", "affymetrix", "agilent") PATH_TO_DATA_FOLDERS <- "C:/Users/RStudio/GEDI" datasets <- ReadGE(dataFolders, sources, path=PATH_TO_DATA_FOLDERS) hsapiens_attr <- BM_attributes(species="hsapiens") attr <- c("ensembl_gene_id", "affy_hg_u133_plus_2", NA) dat <- GEDI(datasets, attributes=attr, BioMart=TRUE, species="hsapiens", path=PATH_TO_DATA_FOLDERS) pheno <- read.csv("pheno.csv", header=TRUE, row.names=1) summary(as.factor(pheno$batch)) summary(as.factor(pheno$status)) cData <- BatchCorrection(dat, pheno$batch, pheno$status, visualize=TRUE) res <- VerifyGEDI(X=cData, y=pheno$status, batch=pheno$batch, model="logistic")

However, after BatchCorrection, my PCA and RLE plots still look like this Rplot

Please help, thank you in advance!

MNStokholm commented 5 months ago

Please excuse my late response. It looks to me that there are two batches in B3. I suggest you investigate if there are any biological or experimental reason why B3 is split in two. This likely confused the batch correction algorithm.

Let me know if this didn't help.