0 Reported DTU genes - Githubissues

Klim314 commented 7 years ago

I've been trying to get RATs working on a dataset of mine, but I've only been getting no positives.

Considering the FPR of 0.05 on a null dataset, 0 positives, even false seems unlikely.

fruce-ki commented 7 years ago

Hello!

Without knowing which version are you using and what the parameter values are, I cannot help you much. The default behaviour has changed a bit through the versions. Getting 0 positives is not uncommon when the replicate reproducibility is enabled with a strict threshold. Can you post the contents of your $Parameters ?

Also, is your data raw counts? TPMs? Scaled up TPMs?

The easiest way to self-diagnose your results is to look at the output tables, and specifically to the logical flags that mark each cumulative decision step. From there you should be able to tell which decision step is the bottleneck. Then we can begin to make sense of what’s going on.

I hope this helps.

Kimon

Sent from my Windows 10 phone

From: Klim314mailto:notifications@github.com Sent: 19 October 2017 08:02 To: bartongroup/RATSmailto:RATS@noreply.github.com Cc: Subscribedmailto:subscribed@noreply.github.com Subject: [bartongroup/RATS] 0 Reported DTU genes (#37)

I've been trying to get RATs working on a dataset of mine, but I've only been getting no positives.

Considering the FPR of 0.05 on a null dataset, 0 positives, even false seems unlikely.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/bartongroup/RATS/issues/37, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARTPOsDVu90MB3j_J0Wdx7ApWSJfxiOcks5stvPpgaJpZM4P-x3S.

Klim314 commented 7 years ago

Hi, thanks for the help on this.

Params output is in the code-field below. Data type is Kallisto-quantified TPMs which have then been scaled by the average sequencing depth (25m reads -> scaling=25).

My own analysis of this showed the following

Significant (q < 0.1): 33965
Sig + Eligible: 1884
Sig + Elig + quant_reproducible: 155
Sig + Elig + rep_reproducible: 0

The bottleneck seems to be at the rep_reproducible step, but I am unsure how to proceed.

$description
[1] NA

$time
[1] "Mon Oct  9 17:13:04 2017"

$rats_version
[1] ‘0.6.0’

$R_version
$R_version$platform
[1] "x86_64-pc-linux-gnu"

$R_version$version.string
[1] "R version 3.4.1 (2017-06-30)"

$var_name
[1] "condition"

$cond_A
[1] "Condition-A"

$cond_B
[1] "Condition-B"

$data_type
[1] "bootstrapped abundance estimates"

$num_replic_A
[1] 3

$num_replic_B
[1] 3

$num_genes
[1] 0

$num_transc
[1] 131195

$tests
[1] "both"

$p_thresh
[1] 0.05

$abund_thresh
[1] 5

$dprop_thresh
[1] 0.2

$abund_scaling
[1] 25

$quant_reprod_thresh
[1] 0

$quant_boot
[1] TRUE

$quant_bootnum
[1] 1000

$rep_reprod_thresh
[1] 0.85

$rep_boot
[1] TRUE

$rep_bootnum
[1] 9

fruce-ki commented 7 years ago

Yep. With rep_boot TRUE i have yet to encounter a 3-replicate dataset that is replicable 8 out of 9 times (0.85). So what you can try is lower the threshold to 7/9 or 6/9 or even 5/9 and see what happens. You don’t need to rerun RATs, you should be able to do that directly from the output tables with basic dataframe or datatable operations.

The FDRs in the preprint were calculated with rep_boot FALSE. What I usually do nowadays is enable it but set the threshold to 0. That way the replicate reproducibility info is available as an extra but is not used to determine DTU.

Sent from my Windows 10 phone

From: Klim314mailto:notifications@github.com Sent: 19 October 2017 08:58 To: bartongroup/RATSmailto:RATS@noreply.github.com Cc: Dr. Kimon Froussiosmailto:jack_ohara_097@hotmail.com; Commentmailto:comment@noreply.github.com Subject: Re: [bartongroup/RATS] 0 Reported DTU genes (#37)

Hi, thanks for the help on this.

Params output is in the code-field below. Data type is Kallisto-quantified TPMs which have then been scaled by the average sequencing depth (25m reads -> scaling=25).

My own analysis of this showed the following

Significant (q < 0.1): 33965 Sig + Eligible: 1884 Sig + Elig + quant_reproducible: 155 Sig + Elig + rep_reproducible: 0

The bottleneck seems to be at the rep_reproducible step, but I am unsure how to proceed.

$description [1] NA

$time [1] "Mon Oct 9 17:13:04 2017"

$rats_version [1] ‘0.6.0’

$R_version $R_version$platform [1] "x86_64-pc-linux-gnu"

$R_version$version.string [1] "R version 3.4.1 (2017-06-30)"

$var_name [1] "condition"

$cond_A [1] "Condition-A"

$cond_B [1] "Condition-B"

$data_type [1] "bootstrapped abundance estimates"

$num_replic_A [1] 3

$num_replic_B [1] 3

$num_genes [1] 0

$num_transc [1] 131195

$tests [1] "both"

$p_thresh [1] 0.05

$abund_thresh [1] 5

$dprop_thresh [1] 0.2

$abund_scaling [1] 25

$quant_reprod_thresh [1] 0

$quant_boot [1] TRUE

$quant_bootnum [1] 1000

$rep_reprod_thresh [1] 0.85

$rep_boot [1] TRUE

$rep_bootnum [1] 9

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bartongroup/RATS/issues/37#issuecomment-337829680, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARTPOpcXMDAR5MNZGreoxZB6PvQMDRb3ks5stwE4gaJpZM4P-x3S.

fruce-ki commented 7 years ago

In terms of how to proceed, here are some ideas:

Plot histograms of the two reproducibility metrics (quant_dtu_freq and rep_dtu_freq) to get the bigger picture. That should give you an idea of what your options are in terms of lowering the reproducibility thresholds.
Work with the genes that are reproducible with regards to the kallisto quantifications, ignoring rep_reproducible.
Rank these 155 genes by their rep_dtu_freq and start working top to bottom until you reach a number of genes you are happy with or a reproducibility value that you no longer trust (somewhere between the default 0.85 and the coin-flipping 0.5).

Let me know if you have any more questions about this!

fruce-ki commented 7 years ago

Also, when subsetting the tables yourself, without using RATs' summary functions, be aware of issue #36 and make sure to remove NAs.

Klim314 commented 7 years ago

Thanks for the help, I'll try this once I've get the chance

fruce-ki commented 7 years ago

On an unrelated note, I just spotted that $Parameters lists the number of genes in your annotation as 0. That is very unlikely, considering you do get test results. So I'm wondering if you could have edited it, or if this is how it came out?

Klim314 commented 7 years ago

I believe this is how it came out. I'll regenerate the data and see if it turns out the same.

Klim314 commented 7 years ago

Hm. num_genes is still zero. However, the annot file is most definitely not empty. Could this be related to the Ensembl annotations?

fruce-ki commented 7 years ago

Ok, thanks. I’ll have a look. Can I have a few lines (first 50) from your annotation in case I can’t recreate the error with mine?

Sent from my Windows 10 phone

From: Klim314mailto:notifications@github.com Sent: 25 October 2017 05:54 To: bartongroup/RATSmailto:RATS@noreply.github.com Cc: Dr. Kimon Froussiosmailto:jack_ohara_097@hotmail.com; Commentmailto:comment@noreply.github.com Subject: Re: [bartongroup/RATS] 0 Reported DTU genes (#37)

Hm. num_genes is still zero. However, the annot file is most definitely not empty. Could this be related to the Ensembl annotations?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bartongroup/RATS/issues/37#issuecomment-339214101, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARTPOrGTuEzD7KoVvmpfZ9XI4VerMG3Eks5svr8KgaJpZM4P-x3S.

Klim314 commented 7 years ago

Here's the (truncated) output from annot2ids

Note: I've trimmed the . from the ensembl IDs, leaving only the gene/transcript ID behind.

annotation_dump.zip

fruce-ki commented 7 years ago

Thanks! The format of this table looks fine.

fruce-ki commented 7 years ago

Are you specifying a value to PARENT_COL for call_DTU() ?

Klim314 commented 7 years ago

No, I did not. Default values were used.

bartongroup / RATS

0 Reported DTU genes #37