bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

Small change in proportion but still significant? #24

Closed jamespblloyd closed 7 years ago

jamespblloyd commented 7 years ago

Why would a transcript show up as significantly changed in proportion (in transcripts table; column 5 == TRUE) but has a Dprop of 0.027609940336013, which is below the 0.05 threshold I set? The pval_corr == 6.09252211790905e-20.

The volcano plot from this experiment does not show any significant spots (that I can see) between -0.05 and 0.05 as expected but in the transcript table, I find that a majority (22123) of significantly changed transcripts (38423) are between -0.05 and 0.05. Why is this the case? They generally have very small p-values and corrected p-values.

for_github

In the Transcript table, is significance purely calculated by the transcript (not gene) stats test? And is this also true for the volcano plot?

fruce-ki commented 7 years ago

Can you please post the contents of $Parameters from your output object?

fruce-ki commented 7 years ago

Depending on the RATs version, column 5 can be different things, as I've added, removed and re-ordered columns over time.

Is column 5 by any chance the one labelled gene_DTU? If that is the case, then this is the result of the gene-level test (i.e. cross-referenced from the DTU column of the $Genes table). All transcripts of that gene should then have the same value in gene_DTU. Even if a particular transcript is not significant ($Transcripts$DTU), the gene as a whole may show significant change possibly due to its other isoforms ($Transcripts$gene_DTU or $Genes$DTU).

fruce-ki commented 7 years ago

The volcano plot created with up to v0.4.4 is based on the transcripts table. Each dot is a transcript.

In the next release (0.4.6), there will be a gene-level volcano. Also the red dots will always be on top of the blue ones, so that they cannot be hidden in the clutter.

A dprop_thresh of 0.05 is extremely low. That means that an isoform going from 50% abundance to 55% abundance can (and probably will) be called as significant. The False Discovery Rate of such low threshold is ridiculous. From a biological persepective, 0.05 I think is basically noise, just normal fluctuation. I wouldn't advise using anything lower than 0.15 for dprop_thresh. Remember that 'dprop' is additive, not multiplicative, it is not a fold-change.

jamespblloyd commented 7 years ago

I tried running $Parameters but this is the output that I get.

mydtu$Parameters NULL But I am running: packageVersion("rats") [1] ‘0.4.4’ And column 5 here equals "sig" and columns 21 and 22 are pval and pval_corr, respectively. I assume that these all match the transcript level test?

This is the header and line of interest which I get out of the transcript table for this transcript of interest: wk 'NR==1 || /TCONS_00109475/' RATs_TranscriptTable_170513.tab target_id parent_id elig_xp elig sig elig_fx quant_reprod DTU gene_DTU meanA meanB stdevA stdevB sumA sumB totalA totalB propA propB Dprop pval pval_corr quant_p_mean quant_p_stdev quant_p_min quant_p_max quant_na_freq quant_dtu_freq TCONS_00109475 XLOC_042426 TRUE TRUE TRUE TRUE TRUE TRUE TRUE 97.55086802833 1253.43878418776 16.491547747489 36.380990660891 292.65260408499 3760.31635256329 3817.19251010157 9695.52726944651 0.0766669753517888 0.387840315236198 0.31117333988441 0 0 2.03121602991379e-258 0 0 2.02555096232517e-256 0 1

So I am not sure why this transcript is sig given it is below the threshold but also why it does not show up on the volcano plot.

I agree that 0.05 change has a LOT of noise. I am also running juncBASE on this, which das default change in PSI of 5% (uses Fisher's exact test on reads for inclusion vs exclusion). So keeping it the same might be useful. Also, I do see some biologically meaningful changes with the sig events even with such a low dprop threshold, but I will be increasing it to ensure it is not all just noise.

Thank thanks for the help!

fruce-ki commented 7 years ago

Ok. Sig labels the statistical significance (p_thresh, pval_corr). Pvalues lower than the threshold are significant. That is separate from the effect size (dprop_thresh, dprop, elig_fx). Effect size should be bigger than the threshold.

What you want to be looking at is the DTU column, which combines all the criteria. Sig and elig_fx and quant_reprod are listed to help you do a post-mortem on the transcripts/genes that are DTU FALSE.

I'm on my phone right now, so I cannot make sense of your quoted output. I'll have to look at it on a computer screen. But I see all the flags are TRUE so DTU would be true too. Dprop looks like it is 0.31, much bigger than 0.05. Pval_corr seems to be 0, so below threshold. Reproducibility is 1, well above the qrep_thresh of 0.95. So this all looks perfectly sensible to me.

It seems you should study the output vignette a bit more. If you find ambiguities or parts that are unclear, please let me know. I have a biased view in that I already know the tool, so things obvious to me may not be obvious to an independent user.

Yourdtu$Parameters should not be NULL. Are you sure you typed it right?

jamespblloyd commented 7 years ago

Very sorry, I copied the wrong transcript from that gene, this one does have a nice and large isoform change. Below is the sig with a small change Dprop:

awk 'NR==1 || /TCONS_00109466/' RATs_TranscriptTable_170513.tab target_id parent_id elig_xp elig sig elig_fx quant_reprod DTU gene_DTU meanA meanB stdevA stdevB sumA sumB totalA totalB propA propB Dprop pval pval_corr quant_p_mean quant_p_stdev quant_p_min quant_p_max quant_na_freq quant_dtu_freq TCONS_00109466 XLOC_042426 TRUE TRUE TRUE FALSE TRUE FALSE TRUE 12.7892188974442 121.715117701947 11.227993377883 16.5276056468492 38.3676566923326 365.145353105842 3817.19251010157 9695.52726944651 0.0100512763217466 0.0376612166577596 0.027609940336013 6.36740446137407e-21 6.09252211790905e-20 1.09561989709805e-06 1.08866056038223e-05 3.99519581630529e-50 0.000108871216959932 0 0

But I see that DTU is FALSE, so for the result of the transcript test, that is represented in the Volcano plot, and passes the Dprop threshold, I should look at the column DTU! Great, thanks for the clarification. I will re-do my analysis using this rather than the "sig" column.

jamesl@atomic:/data/jamesl/NMD/AS_NMD_SFs/SRSF126_KD_CHX/RATs_analysis/Sleuth_obj_170513$ awk 'NR==1 || /TCONS_00109466/' RATs_TranscriptTable_170513.tab | cut -f 8 DTU FALSE

This is really great! Thank you for this help!

PS I re-did $Parameters (I did something stupid first time) and this is what it looks like:

mydtu$Parameters $description [1] "Experiment using the sleuth object as input."

$time [1] "Mon May 15 11:36:25 2017"

$rats_version [1] ‘0.4.4’

$R_version $R_version$platform [1] "x86_64-w64-mingw32"

$R_version$version.string [1] "R version 3.3.1 (2016-06-21)"

$var_name [1] "condition"

$cond_A [1] "A"

$cond_B [1] "B"

$data_type [1] "sleuth"

$num_replic_A [1] 3

$num_replic_B [1] 3

$num_genes [1] 0

$num_transc [1] 197399

$tests [1] "both"

$p_thresh [1] 0.05

$abund_thresh [1] 10

$dprop_thresh [1] 0.05

$quant_reprod_thresh [1] 0.95

$quant_boot [1] TRUE

$quant_bootnum [1] 100

$rep_reprod_thresh [1] NA

$rep_boot [1] FALSE

$rep_bootnum [1] NA

$rep_reprod_as_crit [1] NA

fruce-ki commented 7 years ago

This does not look right:

$num_genes [1] 0

The number of transcripts looks human, so that number should be above 58k genes... Why are there no genes? How are isoform proportions calculated if transcripts are not assigned to genes? And if there are genes, why are they not counted?

jamespblloyd commented 7 years ago

Good catch, I had not noticed this. It is human. Genes are included, I have no issue with the transcript to gene matching that I am aware of. For example, making this plot for the gene the transcript IDs from above come from can be made. So I am not sure why this is reporting no genes as genes have clearly been matched to transcripts.

for_github_2

fruce-ki commented 7 years ago

Ok thanks. I'll have a look at the source code. Maybe I broke something and it's been flying under the radar.

fruce-ki commented 7 years ago

I think I found the culprit in the code: a hard-coded field-name when assigning this value. So I suspect your genes are not in a field called "parent_id" and I suspect that you did specify a field name override. So everything else worked fine. In fact the same issue exists with $Parameters$num_transc, where the default name is again hard coded.

fruce-ki commented 7 years ago

Is there anything else I can do for you on this issue?

jamespblloyd commented 7 years ago

I think that this is it. Massive thanks for all the help and glad I was able to uncover a bug.

fruce-ki commented 7 years ago

Okie dokes! Thanks for using our tool!