bhattacharya-a-bt / NanoNormIter

Pre-processing of NanoString normalization data, as described in "An approach for normalization and quality control for NanoString RNA expression data" (Bhattacharya and Hamilton et al, 2020)
2 stars 3 forks source link

How to obtain normalized NanoString expression set for GSVA #3

Open dzjfeel opened 1 year ago

dzjfeel commented 1 year ago

Hi!

I have a NanoString expression set with batch effects and I want to do GSVA. I followed your codes to remove those unwanted variations and tried to obtained the normalized dataset for GSVA:

library(RUVSeq)
set <- newSeqExpressionSet(as.matrix(raw), phenoData = pData, featureData = fData)
cIdx <- rownames(set)[fData(set)$Class == "Housekeeping"]
set <- betweenLaneNormalization(set, which = "upper")
set <- RUVg(set, cIdx, k = 1)

library(DESeq2)
library(limma)
dds <- DESeqDataSetFromMatrix(counts(set), colData = pData(set), design = ~1)
rowData(dds) <- fData
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds)
dds <- estimateDispersions(dds, fitType = "mean")
vsd <- varianceStabilizingTransformation(dds, blind = FALSE)
mat <- assay(vsd)
covars <- as.matrix(colData(dds)[,grep("W",colnames(colData(dds))),drop = FALSE])
mat <- removeBatchEffect(mat, covariates = covars)
vsd.after <- vsd
assay(vsd.after) <- mat

I used dataset in vsd.after to draw RLE and PCA plot again to check if there were still batch effects. It showed the batch effects were removed.

According to the protocol of GSVA package, the input of gsva() should be a normalized gene expression dataset. I think the dataset in vsd.after is the one after being removed batch effects and normalization. But I remembered the protocol of DESeq2 said that it is only good for visualization and could not be used for downstream analysis.

I also found there is a dataset in set, which is different from my raw expression set and not being normalized by DESeq2 and limma package. It could be extracted by the code set@assayData[["normalizedCounts"]]. I am not sure what it is. Is it the counts without batch effects? Could it be used for GSVA directly?

Could you tell me which one (vsd.after or set) could be used for GSVA? If they are not the proper data, how can I get the normalized dataset for GSVA?

Thank you!