Pre-processing of NanoString normalization data, as described in "An approach for normalization and quality control for NanoString RNA expression data" (Bhattacharya and Hamilton et al, 2020)
2
stars
3
forks
source link
How to obtain normalized NanoString expression set for GSVA #3
I have a NanoString expression set with batch effects and I want to do GSVA. I followed your codes to remove those unwanted variations and tried to obtained the normalized dataset for GSVA:
library(RUVSeq)
set <- newSeqExpressionSet(as.matrix(raw), phenoData = pData, featureData = fData)
cIdx <- rownames(set)[fData(set)$Class == "Housekeeping"]
set <- betweenLaneNormalization(set, which = "upper")
set <- RUVg(set, cIdx, k = 1)
library(DESeq2)
library(limma)
dds <- DESeqDataSetFromMatrix(counts(set), colData = pData(set), design = ~1)
rowData(dds) <- fData
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds)
dds <- estimateDispersions(dds, fitType = "mean")
vsd <- varianceStabilizingTransformation(dds, blind = FALSE)
mat <- assay(vsd)
covars <- as.matrix(colData(dds)[,grep("W",colnames(colData(dds))),drop = FALSE])
mat <- removeBatchEffect(mat, covariates = covars)
vsd.after <- vsd
assay(vsd.after) <- mat
I used dataset in vsd.after to draw RLE and PCA plot again to check if there were still batch effects. It showed the batch effects were removed.
According to the protocol of GSVA package, the input of gsva() should be a normalized gene expression dataset. I think the dataset in vsd.after is the one after being removed batch effects and normalization. But I remembered the protocol of DESeq2 said that it is only good for visualization and could not be used for downstream analysis.
I also found there is a dataset in set, which is different from my raw expression set and not being normalized by DESeq2 and limma package. It could be extracted by the code set@assayData[["normalizedCounts"]]. I am not sure what it is. Is it the counts without batch effects? Could it be used for GSVA directly?
Could you tell me which one (vsd.after or set) could be used for GSVA? If they are not the proper data, how can I get the normalized dataset for GSVA?
Hi!
I have a NanoString expression set with batch effects and I want to do GSVA. I followed your codes to remove those unwanted variations and tried to obtained the normalized dataset for GSVA:
I used dataset in
vsd.after
to draw RLE and PCA plot again to check if there were still batch effects. It showed the batch effects were removed.According to the protocol of
GSVA
package, the input ofgsva()
should be a normalized gene expression dataset. I think the dataset invsd.after
is the one after being removed batch effects and normalization. But I remembered the protocol ofDESeq2
said that it is only good for visualization and could not be used for downstream analysis.I also found there is a dataset in
set
, which is different from my raw expression set and not being normalized byDESeq2
andlimma
package. It could be extracted by the codeset@assayData[["normalizedCounts"]]
. I am not sure what it is. Is it the counts without batch effects? Could it be used for GSVA directly?Could you tell me which one (
vsd.after
orset
) could be used for GSVA? If they are not the proper data, how can I get the normalized dataset for GSVA?Thank you!