drisso / zinbwave

Clone of the Bioconductor repository for the zinbwave package, see https://bioconductor.org/packages/zinbwave
43 stars 10 forks source link

zinbwave+Deseq2 Differential expression 10x #36

Closed bio-la closed 4 years ago

bio-la commented 6 years ago

@koenvandenberge @drisso Sorry to quote a closed issue #33 but I'm trying to understand what would be the most appropriate choice when computing DE genes. I have a 10x dataset where I used the zinbwave model to compute the W matrix. I have 7 patients, 2 biological conditions(treated vs controls), and within the treated condition 2 different drugs. (i know, it's a hell of a mess). the aim was to correct for the known variability in the experimental design, and to account as well for any other source of variation, to cluster the cells based on their top genes ignoring the fact that the patients are heterogeneous and belong to 2 different groups. i tested several zinbwave params and ended up with the following design: X=~patient.id+drug.treatment, k=10. then used the W matrix to do clustering. (as suggested in the vignette)

now I need to perform differential expression WITHIN cluster (so it's n differential expression testings, n= number of clusters), contrasting on the "condition" level (treat vs untreat). this is a slightly different problem from the first task and I am wondering what is the appropriate choice for the weights estimation using zinbwave. given that for each cluster I have cells from 2 conditions and all the patients are represented (not homogeneously but that's the inherent variability of the experiment I guess) should I estimate the weights with zinbwave(zinb, K=0, X=~patient.id+condition, epsilon=1e12) and use the same design with Deseq for the diffexp test? And should I really use K=0?

thank you!

koenvandenberge commented 6 years ago

Dear @bio-la

I am assuming that drug.treatment is a factor variable that contains both the control treatment as well as the two different drugs. I am also assuming that you have used K=10 in the zinbwave model for the dimension reduction, then clustered to identify cell types based on that dimension reduction, hence giving you cell type labels for every cell. Since now you have the cell type labels, I would recommend to refit the zinbwave model with the design ~patient.id+drug.treatment*cell.type, which allows you to estimate a different response on the treatment for every cell type that you have identified. If instead you think that you can assume the treatment effect to be identical for all cell types, you can also use ~patient.id+drug.treatment+cell.type, but I would think that this is probably not the case.

I would then use the same design in the DESeq2 analysis, and set up contrasts for your hypotheses of interest.

Hope this helps, and let me know if anything is unclear.

bio-la commented 5 years ago

Hi @koenvandenberge and thanks for your reply. Your interpretation of the first part is correct. Thanks for the suggestion on the interaction term in the design.

given that I have 4 factors in my design ( patient.id, drug.treatment, condition, cell.type), and I want to contrast on condition, shouldn't I use ~patient.id+drug.treatment+ condition*cell.type, to contrast on each level ofconditon*cell.type ? also, the problem is that my design matrix is not full rank if i use this design(or the one you suggested in the first place), i can estimate the weights in zinbwave, but I can't use it in Deseq. the only combination i can use is condition*cell.type(or treatment*cell.type) (cause i have cells for both levels of the condition, but cells can belong only to one patient or another, and some of the cells exist only in one level of the drug.treatment) am I doing this wrong? thanks!

patient drug.treatment condition
pt1 drug1 treated
pt2 drug1 treated
pt3 drug2 treated
pt4 drug2 treated
pt5 untreated untreated
pt6 untreated untreated
pt7 untreated untreated
koenvandenberge commented 5 years ago

Dear @bio-la

Apologies for my late reply.

It is not surprising that your design is not full rank. Indeed, if one knows the drug.treatment, one also immediately knows the condition. Hence, the condition effect seems obsolete. Also note that, looking at the table, it is difficult to separate the patient effect from the drug effect. If you want to perform DE for every cell type in a single model, you would actually need a mixed model with design ~ cell.type * drug.treatment + (1|patient) to account for the nesting (correlation) of cell types within patients and the between-patient variability aswell as the within-patient variability. To circumvent this, you might instead fit a separate model for every cell type, since then you can safely use only the within-patient variability, and you can simply use the design ~ drug.treatment + patient, and test for the treatment effect. However, the downside of this model is that you will always be using only a subset of the data to do the inference.

Please see the DESeq2 vignette for more info on design matrices that are not full rank.

Best, Koen