Open gianfilippo opened 2 years ago
Hi Maddy,
I take the reverseDecon output, "rdecon" in my case, and, for each sample, I generate cell type specific gene expression levels. See blow for an example:
for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] + rdecon$coefs[,1]
I am not sure this is correct. Anyway, the resulting cell type specific gene expression matrix, normAstro is a gene X sample matrix, with the same gene names and sample IDs as the original matrix.
I then add it to the NanoStringGeoMxSet and git is a 'name', as follow: assayDataElement(object = myData, elt = "log_q_norm_Astro") = log2(normCellType[["Astro"]]+1.0)
Then I try to test for differential expression as described in my previous post.
Unless there is a problem with the way I add the assayDataElement to the NanoStringGeoMxSet, I cannot see the matrix format to be a problem. May be the way I estimate cell type specific expression levels by gene is wrong? what do you think ?
Thanks Gianfilippo
Hi Gianfilippo,
I was reading your issue incorrectly and noticed my mistake right after I sent my previous comment. I am looking into your issue, now that I understand it correctly, and I will get back to you shortly.
Maddy
Hi Gianfilippo,
So you are creating the cell type specific gene expression levels in the expected format so that is not the issue. The issue seems to stem from the significant number of zeros that are created from generating this matrix. Our biostatistician suggests
"limiting this analysis to genes that are mainly coming from the cell type of interest. E.g. if a gene's "astrocyte-specific" expression is >50% of the total background-subtracted expression, then it could be worth analyzing in this way. This kind of rule does remove most genes from consideration."
But if you don't want to reduce the gene list that much, you at least need to remove the genes that have zeros across all of the samples. Having the same value across the row is causing the standard deviation error.
Maddy
How naive of me!! I did not think about re-filtering the matrix. Thank you (and your biostatistician) for noticing it. I will let you know how it goes Best Gianfilippo
ok, I made some changes but I am still having problems. I get the following error
Error in if (maxmingrad > ccl$tol) { : missing value where TRUE/FALSE needed
Can you please help ?
Below is what I did. In the first few lines, I create a copy of the original data, then I filter out from the
myData is the original data object.
myData_test = myData
assayDataElement(object=myData_test, elt="q_norm_Astro")) <- normCellType[["Astro"]]
keep = which(rowSums(normCellType[["Astro"]]>=0.0001)>8)
myData_test = myData_test[keep,]
assayDataElement(object=myData_test, elt="log_q_norm_Astro") <- assayDataApply(myData_test, 2, FUN=log, base=2, elt="q_norm_Astro")
ind <- pData(myData_test)$segment == "AstroMarker"
mixedOutmc <- mixedModelDE(myData_test[, ind], elt = "log_q_norm_Astro", modelFormula = ~ testClass + (1 | slide), groupVar = "testClass", nCores = 20, multiCore = F)
r_test <- do.call(rbind, mixedOutmc["lsmeans", ])
tests <- rownames(r_test)
r_test <- as.data.frame(r_test)
r_test$Contrast <- tests
r_test$Gene <- unlist(lapply(colnames(mixedOutmc), rep, nrow(mixedOutmc["lsmeans", ][[1]])))
r_test$Subset <- segment
r_test$FDR <- p.adjust(r_test$`Pr(>|t|)`, method = "fdr")
r_test <- r_test[, c("Gene", "Subset", "Contrast", "Estimate", "Pr(>|t|)", "FDR")]
Hi, I think I found the issue. The way I was getting the cell type specific gene expression estimate is wrong, since I was adding the intercept value to the gene expression estimate, i.e. for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] + rdecon$coefs[,1]
In reality this intercept, as I understand it, is part of the model for overall gene expression. So, when I removed it, and estimated cell type gene expression as for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] the code works. The reason for the crash, as you mentioned also, is constant expression across samples, which happens anywhere the cell-sample coeff is zero, making the only non-zero entry the intercept coefficient. Perhaps, a question is how good are these estimates for differential expression analysis, since, I believe, you have not assessed that in the paper. i can see that the difference from actual measurement is non-negligible. Also, what is the meaning of the intercept coefficient ? background noise ? Thanks Gianfilippo
Hi Gianfilippo, Glad you found the fix! Answers to your questions:
Hi,
could you please clarify how to run Differential Expression on the cell type specific data generated by reverseDecon ?
Below is what I did so far:
I followed the GeoMxWorkflows tutorial to QC, filter, process the data from DCC files
Next, I used assayData(myData)[["q_norm"]] as final matrix for analysis and tried to follow the SpatialDecon tutorial: norm = assayData(myData)[["q_norm"]]
infer background using derive_GeoMx_background
run the spatialdecon function as follow: spatialdecon(norm = norm, bg = bg, X = mouseAllenBrain, align_genes = TRUE) where mouseAllenBrain = download_profile_matrix(species = "Mouse", age_group = "Adult", matrixname = "Brain_AllenBrainAtlas")
Then I used reverseDecon to get cell type specific gene expression levels rdecon = reverseDecon(norm = norm, beta = res$beta)
from here I derived the cell type specific gene expression levels (not sure about this). for instance, for Actrocytes, for (i in 1:ncol(norm)) normAstro[,i] = rdecon$coefs[,"Astro"] * res$beta["Astro",i] + rdecon$coefs[,1]
the normAstro matrix has some entries equal (or very close) to zero, so, log2(normAstro) not an option. I use log2(normAstro+1) and assign it to the GeoMXset object assayDataElement(object = myData, elt = "log_q_norm_Astro") = log2(normCellType[["Astro"]]+1.0)
At this point I do differential expression testing as follow (as in the tutorial) for each segment. NOTE: the experimental design is such that each slide has both "Ref" and "Pert" in testClass, but I may have two slides each individual, so I used a LMM without random slope
The code crashes at this point with the following error "log_q_norm_Astro" Error in checkForRemoteErrors(val) : 20 nodes produced errors; first error: invalid class “corMatrix” object: 'sd' slot has non-finite entries
Please let me know if you need more details
Thanks Gianfilippo