TaliaferroLab / LABRAT

Lightweight Alignment Based Resolution of Alternative Three Prime Ends
MIT License
8 stars 4 forks source link

Multiple covariates documentation #4

Open ojziff opened 3 years ago

ojziff commented 3 years ago

Dear @taliaferrojm,

The documentation for LABRAT is excellent. One thing that is slightly unclear is instructions on significance testing with multiple covariates. I would like to test the influence of two variables (treatment and genotype) together on APA deltapsi. Whilst i can see how to remove the effect of one of these variable by adding it as a column covariate1 to sampconds, I am trying to include a second a variable into the conditions to test (and not remove its effect).

With differential gene expression using DESeq2 i do this by specifying the design = ~ Genotype + Treatment + Genotype:Treatment and results contrast = "genotypeMUTANT.treatmentTREATED". Is it possible to do a similar comparison here?

Many thanks! Oliver

taliaferrojm commented 3 years ago

Hi Oliver,

Thanks for the kind words. You are correct about how to include a covariate.

Unfortunately, LABRAT does not currently handle interaction terms like what you have specified with DESeq2. I think this should be possible to include, though. The relevant function is getdpsis_covariate. Design formulae in there are very R-like and made using statsmodels.formula.api. This accepts interactions terms (see https://www.statsmodels.org/devel/example_formulas.html#multiplicative-interactions).

What would be a little trickier is getting results from that for specific comparisons (i.e. what you do with the contrast argument in DESeq2).

Alternatively, could you combine combinations of factors into a single factor, and then test for changes across the combined factor? This is what is suggested in the DESeq2 manual (https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions). In this case, you would have to run the calculatepsi step once for each pairwise comparison you wanted to do, giving it a different sampconds input each time.

ojziff commented 3 years ago

Many thanks @taliaferrojm!

Combining the two factors into a single factor is a nice idea, but as you note I have to run calculatepsi twice, which still leaves me with two delta Psi and P values per APA event (Treated - Untreated & Mutant - Wildtype).

Gaining inspiration from IRFinder, one solution could be to actually run DESeq2 GLM on the Salmon quant output instead of the getdpsis_covariate, although we would still need to add in the Psi calculation steps and filters from calculatepsi?

Best wishes, Oliver

taliaferrojm commented 3 years ago

That's a very interesting idea! One thing that gives me a bit of a pause is that the GLM in LABRAT uses psi values, while obviously the salmon output is transcript quantifications. I don't know how psi values are usually distributed, but I'm pretty sure they can't be modeled using a negative binomial. I'm not sure if this matters or not with regards to leveraging DESeq2 or not, because I'm not sure what assumptions it's making by the time it gets to the GLM.

The LABRAT.psis.pval output file contains the psi values for every gene in every condition, so it could be easiest (at least for writing/testing purposes) to just start with that and then write whatever is needed to use those values with the GLM you want. That way there is no worrying about filtering psi values or anything.