drisso / zinbwave

Clone of the Bioconductor repository for the zinbwave package, see https://bioconductor.org/packages/zinbwave
43 stars 10 forks source link

model and K values in zinbwave + Deseq2 #64

Open AyanoNomura opened 4 years ago

AyanoNomura commented 4 years ago

Hello,

Thank you very much for your package. I have questions about model and K value settings in zinbwave function when combining with DESeq2.

I’m working on SMART-seqHT dataset of a specific cell type, human sample, to compare gene expressions under two conditions(positive vs negative). The samples are from 3 patients. My interest is in the DEGs between positive and negative cells, and the normalized values for heatmap visualization. With Seurat package, I failed to find enough DEGs for further functional analysis, so I tried the zinbwave and DESeq2, in which I found few hundred DEGs with the Wald test.

These are the tutorials I followed: https://github.com/mikelove/zinbwave-deseq2/blob/master/zinbwave-deseq2.knit.md https://bioconductor.org/packages/release/bioc/vignettes/zinbwave/inst/doc/intro.html http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

My questions are:

  1. About the model:

My current code is:

zinb <- zinbwave(zinb, K=0, BPPARAM=SerialParam(), epsilon=1e12, normalizedValues=TRUE, observationalWeights = TRUE)
dds <- DESeqDataSet(zinb, design= ~ condition + patient)

But should I include the model in the zinbwave function too?:

zinb <- zinbwave(zinb, K=0, X=“~ condition + patient”, BPPARAM=SerialParam(), epsilon=1e12, normalizedValues=TRUE, observationalWeights = TRUE)
dds <- DESeqDataSet(zinb, design= ~ condition + patient)

The second codes gives me about 40 less DEGs than the first codes. According to issue #36( https://github.com/drisso/zinbwave/issues/36) , should I use the second codes ? And where does this different number of DEGs come from?

  1. Is the K value setting (“K=0“) appropriate, for both DEGs and normalized values?

Here is my current code:

sce <- SingleCellExperiment(assays = list(counts = as.matrix(count_matrix)), colData = sample)

filter <- rowSums(assay(zinb)>5)>5
zinb <- sce[filter,]

zinb <- zinbwave(zinb, K=0, BPPARAM=SerialParam(), epsilon=1e12, normalizedValues=TRUE, observationalWeights = TRUE)

write.table(assay(zinb,"normalizedValues"), file="normalizedValues.txt",sep="\t")

dds <- DESeqDataSet(zinb, design= ~ condition + patient)
dds <- estimateSizeFactors(dds, type="poscounts")

scr <- scran::calculateSumFactors(dds)
sizeFactors(dds) <- scr

dds <- DESeq(dds, sfType="poscounts", useT=TRUE, minmu=1e-6, minRep=Inf) 
res <- lfcShrink(dds, contrast=c(“condition”, "positive, "negative”), type = "normal")

I apologize for my poor understanding and duplicated question. Thank you very much in advance.

drisso commented 4 years ago

Your code looks good to me. As for whether including the model in zinbwave, we usually recommend to use the same design matrix in both zinbwave and DESeq2. In your case it would be interesting to see what are those genes and whether they are more likely to be true or false positives (e.g., are they enriched for any relevant GO term?).

Finally, the value of K should be greater than 0 only if you want to use zinbwave for dimensionality reduction.

PS. In the future, this type of questions is more suited for the Bioconductor support forum (support.bioconductor.org) than for github issues.

Hope this helps. Davide

yeroslaviz commented 4 years ago

Just a comment - In the second line of your code - should it be zinb or truly sce? Otherwise your zinb object is not defined yet

AyanoNomura commented 4 years ago

Hi Davide,

Thank you so much for your reply. And sorry for the wrong location of putting the question.

As you pointed out, the enriched GO terms in the first codes and the second codes were almost the same.

A few of the 40 genes that were DEGs in the first code(design only in Deseq2) had NA for adjusted & unadjusted p values in the second Deseq2 result (design in both zinb and Deseq2), I don't understand why, for they were neither 0 total counts nor there were no obvious outliers. But most of them were, considering the purpose of the experiment, irrelevant, so they may have been false positives.

I greatly appreciate your help.

Ayano

AyanoNomura commented 4 years ago

Hi yeroslaviz, Thank you for your comment. Yes, as you pointed out, the second line should be zinb, not sce. Thank you!

hermidalc commented 2 months ago

Your code looks good to me. As for whether including the model in zinbwave, we usually recommend to use the same design matrix in both zinbwave and DESeq2. In your case it would be interesting to see what are those genes and whether they are more likely to be true or false positives (e.g., are they enriched for any relevant GO term?).

Finally, the value of K should be greater than 0 only if you want to use zinbwave for dimensionality reduction.

PS. In the future, this type of questions is more suited for the Bioconductor support forum (support.bioconductor.org) than for github issues.

Hope this helps. Davide

Hi - I would include that in the vignette. For new users coming to the zinbwave vignette it's not immediately clear that one should probably include the same design formula in X before calling zinbwave to get the observation weights before using the same design with the zinbwave weights in the DE analysis, because in the vignette it's not doing that. It can make for significantly different weights. Thank you Davide!