Unexpected high q-values

OwenDonohoe commented 6 years ago

Hi

I've been using Ballgown recently to look for differential transcript expression, I've fount it to be a great tool to use! I just have a quick question on q-values. I'm currently comparing transcript expression between two groups. I have 4 samples within each group (Reps-1-4). RNA seq data was mapped using HiSat and assembled using StringTie prior to analysis using Ballgown. I noticed that there are clear differences in transcript expression between the two groups and in indeed many of theses cases display corresponding p-values <0.05 and q-values <0.1 (q-value <0.1 is my cut-off point)

However, there are some transcripts that show clear differences in expression between the 2 groups, and indeed display p-values >0.05, but they have corresponding q-values well above the cut-off of 0.1. See prime example below

Group 1 FPKM 7.775204 20.20072 30.07049 16.23172

Group 2 FPKM 0 0 0 0

Ballgown Stats results fc: 10.32569195 pval: 0.039737446 qval: 0.460668548 (higher than cut-off of 0.1)

Here, despite the clear differences in expression of this transcript between the two groups, going by my cut-off of (requiring all q-values to be <0.1) this change in expression it will not be seen as statistically significant.

I am aware that low p-values do not necessarily mean that there will be corresponding low q -values if there is just not any signal in the data, but I find it hard to ignore/eliminate results like the example above if there could be something significant occurring from a biological perspective. However at the same time, I believe its best to have a rigorous cut-off point of (q-value <0.1) in order to reduce the chances of reporting false positives.

I have have attached the FPKM values and statistic output (similar to above) for 5 transcripts that also have unexpected high q-values given the FPKM values. and I've also given examples of 5 that have expected low q-values given the FPKM values.

One thing I have noticed between these 2 groups in the attached example (i.e. transcripts with expected low q-values and those with higher than expected q-values) is that that q-values tend to be larger, where there is increased variance/outliers among replicates within the group. Would the presence of such variability skew that q-values to be higher than expected? Is it simply this that could be causing high q-values for some transcripts?

I would appreciate any feedback you might have on this.

Ballgown_Results_Sample_GitHub.xlsx

FCCassidy commented 5 years ago

Hi Owen,

Did you find an answer to this post? I am currently making the decision of my q value cut off and would value insight based on your post above.

Your thoughts appreciated

OwenDonohoe commented 5 years ago

No answer. I actually changed to using DESeq2 instead. I found it to be very useful.

FCCassidy commented 5 years ago

I was actually worried you would say that. Analysis with Ballgown was included in the cost of our sequencing. I was hoping I would be able to use their outputs but I might have to go back and re-run analysis myself. Thanks for your reply.

alyssafrazee / ballgown

Unexpected high q-values #145