cole-trapnell-lab / cufflinks

Boost Software License 1.0
310 stars 116 forks source link

Cuffdiff bug: FPKM is way above range conf_lo to conf_hi (and is wrong) #96

Open sammyjava opened 6 years ago

sammyjava commented 6 years ago

I've run Cuffdiff v2.1.1 (4046M) on DNAnexus for an experiment a couple of times, keeping or removing other conditions, to see if this is reproducible and it is. Here's a relevant genes_fpkm_tracking line:

tracking_id class_code  nearest_ref_id  gene_id gene_short_name tss_id  locus   length  coverage
XLOC_073347  -  -   XLOC_073347 EPlOSAG00000014290,OS08G0184300 TSS78584,TSS78585   8:4947262-4952194   -   -   
OSH1-0_FPKM OSH1-0_conf_lo  OSH1-0_conf_hi  OSH1-0_status
150.955     107.801     194.131     OK
OSH1-3_FPKM OSH1-3_conf_lo  OSH1-3_conf_hi  OSH1-3_status
4587.81     99.1039     291.232     OK
OSH1-6_FPKM OSH1-6_conf_lo  OSH1-6_conf_hi  OSH1-6_status
105.106     79.2116     130.946     OK
OSH1-24_FPKM    OSH1-24_conf_lo OSH1-24_conf_hi OSH1-24_status
71.4966     53.5662     89.4439     OK

Note that OSH1-3 reports FPKM=4587.81 while conf=99.1039 to 291.232. Looking at the OSH1-3 BAMs (2 reps per condition) in a JBrowse (see below), there's no question that the conf range is correct while the FPKM value is errant (and results in highly significant DE for OSH1-3 vs OSH1-0,6,24). There is no DE at this locus.

So clearly this is a bug. Cuffdiff reports a very high FPKM value for this particular condition, and it's reproducible. Scanning over the data, it looks like it happened at about 30 genes out of 35584 (rice). Cufflinks (v2.2.1 linked against Boost version 104700) did NOT report these high FPKM values; only Cuffdiff did for the composites.

I'll note that another run with the same compile of Cuffdiff as Cufflinks mentioned above (same version, though) resulted in the conf values being high as well (fpkm=4588.62 conf_lo=4552.9 conf_hi=4624.35). But, again, there are no reads to back up this large FPKM value. You can see the reads on this locus here:

see it on a JBrowse

Any ideas where I should look to see what happened? This is rather concerting!

sammyjava commented 6 years ago

Follow-up: running the same Linux binary (cufflinks-2.2.1.Linux_x86_64) downloaded from here, on a totally different system (Carnegie Memex rather than DNAnexus/Amazon EC) this bug does NOT reproduce:

tracking_id class_code  nearest_ref_id  gene_id gene_short_name tss_id  locus   length  coverage
OS08G0184300    -       -       OS08G0184300    -       -       8:4947262-4952194       -       -
OSH1-0_FPKM OSH1-0_conf_lo  OSH1-0_conf_hi  OSH1-0_status
150.438     112.466     188.409     OK
OSH1-3_FPKM OSH1-3_conf_lo  OSH1-3_conf_hi  OSH1-3_status
128.515     95.8528     161.177     OK
OSH1-6_FPKM OSH1-6_conf_lo  OSH1-6_conf_hi  OSH1-6_status
103.632     77.2523     130.012     OK
OSH1-24_FPKM    OSH1-24_conf_lo OSH1-24_conf_hi OSH1-24_status
70.2417     51.8599     88.6235     OK

And, therefore, the corresponding DE call is negative:

test_id     gene_id     gene    locus           sample_1    sample_2    
OS08G0184300    OS08G0184300    -       8:4947262-4952194   OSH1-0      OSH1-3
status  value_1 value_2
OK  150.438 128.515
log2(fold_change)   test_stat   p_value q_value     significant
-0.227229       -0.879438   0.20745 0.717647    no

So it's rather mysterious, depending on the platform!