XPRESSyourself / XPRESSpipe

An alignment and analysis pipeline for Ribosome Profiling and RNA-seq data
https://xpresspipe.readthedocs.io/en/latest/
GNU General Public License v3.0
13 stars 4 forks source link

Potential error in https://xpresspipe.readthedocs.io/en/latest/content/analysis.html #45

Closed jdreyf closed 3 years ago

jdreyf commented 3 years ago
  1. You describe a "base" variable, and I presume this is the variable whose coefficient is tested, but I don't see you state which coefficient you actually test, which would be especially helpful for ribosome profiling interaction model.
  2. You state that the "base" parameter needs to be first alphabetically, but in Example 2 – Analyze RNA-seq data that was prepared in different batches, "Batch" precedes "Condition", whereas I would think "Condition" is the base variable. Also, in Example 2 Batch appears totally confounded with Condition.

Thanks for writing such a nice pipeline.

j-berg commented 3 years ago

Hi @jdreyf , thanks for pointing this out. I will put this on my to-do list and should have it addressed by the weekend. Thanks for your interest in the toolkit!

jdreyf commented 3 years ago

I think it would be helpful on this docs/analysis page if you show the first few rows of the output and provide interpretation, as I've also noticed that from your RNA diffxpress call, the log2FoldChange output column wouldn't clarify to the user if the contrast tested was a_WT-b_EXP or the opposite.

j-berg commented 3 years ago

My edits are now live on https://xpresspipe.readthedocs.io/en/latest/content/analysis.html. Could you take a look and let me know if this addresses your comments or if there is anything remaining that you think should be addressed? Thanks!

jdreyf commented 3 years ago

For transparency I think it would be helpful to explicate what coefficients from your model (~1+Type+Condition+Type:Condition) are tested. This would help to understand how your wrapper works, which is necessary when the design must be changed.

For example, I am trying to use xpresspipe to analyze a situation like Example 1, where each mouse is in group A or B and has both IP & input. I'm not sure if Example 1 has paired samples, but here I do, which I presume would be a common design. The ribosome profiling analysis, by extrapolation from the examples, would be xpresspipe diffxpress -i counts_data.tsv --sample sample_info.txt --design Type+Condition+Type:Condition+Mouse, but then I think that not all coefficients are estimable and that this analysis would involve comparisons both within and between mice. Instead of DESeq2, I use the R packages limma and edgeR, and according to the limma manual, when "we need to make comparisons both within and between subjects, it is necessary to treat Patient as a random effect. This can be done in limma using the duplicateCorrelation function" (https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf; Section 9.7). I'm not sure how well xpresspipe or DESeq2 can handle this case, but I think it would be helpful show what should be done in this case, and discuss limitations if present.

Thanks for the quick and helpful changes.

j-berg commented 3 years ago

I don't know if I am completely addressing your question, but the DESeq documentation states the following:

DESeq2 will work with any kind of design specified using the R formula. We encourage users to consider 
exploratory data analysis such as principal components analysis rather than performing statistical testing 
of all pairs of many groups of samples. Statistical testing is one of many ways of describing differences 
between samples.

As a speed concern with fitting very large models, note that each additional level of a factor in the design 
formula adds another parameter to the GLM which is fit by DESeq2. Users might consider first removing 
genes with very few reads, as this will speed up the fitting procedure.

So essentially, any general R design formula can be used for the --design option in XPRESSpipe.

Additionally, the documentation states: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

We have two groups of samples X and Y, each with three distinct individuals (labeled here 1-6). For each 
individual, we have conditions A and B (for example, this could be control and treated).

This design can be analyzed by DESeq2 but requires a bit of refactoring in order to fit the model terms. Here
we will use a trick described in the edgeR user guide, from the section Comparisons Both Between and Within 
Subjects. If we try to analyze with a formula such as, ~ ind + grp*cnd, we will obtain an error, because the 
effect for group is a linear combination of the individuals.

However, the following steps allow for an analysis of group-specific condition effects, while controlling for 
differences in individual. For object construction, you can use a simple design, such as ~ ind + cnd, as long 
as you remember to replace it before running DESeq. Then add a column ind.n which distinguishes the 
individuals nested within a group. Here, we add this column to coldata, but in practice you would add this 
column to dds.

...

I think that this particular approach available within DESeq2 is maybe what you are referencing related to handling patients as random effects. If so, I don't think that the XPRESSpipe wrapper can directly handle this. I will make a note of that, and reference the exact locations in the DESeq2 docs that mention these points in the XPRESSpipe docs.

jdreyf commented 3 years ago

I now understand that Question 1 was misguided. The current docs Analysis section states that "Your base (denominator) parameter in a given factor column in the sample_info file must be first alphabetically" and I had misunderstood the term "parameter" which I now understand to be a factor level in R, and the base parameter to be the first factor level.

I now also understand the answer to Question 2, which is that xpresspipe/DESeq2 tests the null hypothesis that the coefficient of the last term in any design formula is zero against the alternative hypothesis that this coefficient is not zero.

Thanks for your tremendous work on the pipeline and your helpful support, inc. our personal communication :100:

j-berg commented 3 years ago

Perfect, thanks! Let me know if you run into any other questions or issues!

jdreyf commented 3 years ago

I looked back at the Analysis doc ( https://xpresspipe.readthedocs.io/en/latest/content/analysis.html) and I recommend you remove "However, changes to the order will cause negligible differences in output. For example, if we scramble the order the factors are listed in the design formula, we obtain essentially the same output:" It's true that the scramble you tried will make no difference, but if Type:Condition isn't last, you will get different results. I think it'd be better to point out that what matters in the design is what term is last, because DESeq2 is testing the coefficient of the last term only. Perhaps my language in the resolution of https://github.com/XPRESSyourself/XPRESSpipe/issues/45 could help.

On Sun, Mar 21, 2021 at 5:04 PM Jordan Berg @.***> wrote:

Perfect, thanks! Let me know if you run into any other questions or issues!

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/XPRESSyourself/XPRESSpipe/issues/45#issuecomment-803659669, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALNWDDBNNKADH3UQUWMJVDTEZNOXANCNFSM4YFCNG6Q .

j-berg commented 3 years ago

Hi @jdreyf,

Thanks for the feedback! I have updated the documentation to reflect your recommendation. Please let me know how this looks when you get a chance.

Best, Jordan

jdreyf commented 3 years ago

Looks good to me, thank you

On Thu, Jul 29, 2021 at 2:56 PM Jordan Berg @.***> wrote:

Hi @jdreyf https://github.com/jdreyf,

Thanks for the feedback! I have updated the documentation to reflect your recommendation. Please let me know how this looks when you get a chance.

Best, Jordan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/XPRESSyourself/XPRESSpipe/issues/45#issuecomment-889381959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALNWDDBZKMTG75QP2YEEBLT2GP45ANCNFSM4YFCNG6Q .

j-berg commented 3 years ago

Great, thanks again!