gagneurlab / OUTRIDER

OUTRIDER: OUTlier in RNA-seq fInDER is an R-based framework to find aberrantly expressed genes in RNA-seq data
MIT License
49 stars 11 forks source link

Validation data set not producing values concordant with manual #34

Open J-Lye opened 3 years ago

J-Lye commented 3 years ago

I've used OUTRIDER in my Thesis but have been advised that the very slight variability between the values I achieve and the values achieved in the OUTRIDER manual is a serious concern and may invalidate my results.

No matter how many times I repeat or modify my approach the results are always the same, it's a tiny difference 1.2x10^12 in value for example. I get these slightly different results no matter if I use the simple example or download the Kremer dataset and run it with the full OUTRIDER code.

Results from the Manual

  geneID sampleID pValue padjust zScore l2fc rawcounts normcounts
1:00 ATAD3C MUC1360 2.82E-11 1.57E-07 5.27 1.87 948 246.26
2:00 NBPF15 MUC1351 8.10E-10 4.51E-06 5.75 0.77 7591 7050.72
3:00 MSTO1 MUC1367 4.46E-09 2.48E-05 -6.2 -0.81 761 729.7
4:00 HDAC1 MUC1350 1.54E-08 8.56E-05 -5.93 -0.79 2215 2113.06
5:00 DCAF6 MUC1374 6.93E-08 3.86E-04 -5.68 -0.61 2348 3084.41
6:00 NBPF16 MUC1351 2.61E-07 7.25E-04 4.82 0.67 4014 3834.4
meanCorrected theta aberrant AberrantBySample AberrantByGene padj_rank
1:00 84.16 16.66 TRUE 1 1 1
2:00 4417.1 109.8 TRUE 2 1 1
3:00 1238.19 151.57 TRUE 1 1 1
4:00 3521.37 134.57 TRUE 1 1 1
5:00 4603 197.14 TRUE 1 1 1
6:00 2564.52 105.73 TRUE 2 1 2

Results from my Rscript I am using the exact script from the manual and would benefit from confirmation others / developers are also experiencing this and it's due to some optimisation or something?

  geneID sampleID pValue padjust zScore l2fc rawcounts normcounts
1:00 ATAD3C MUC1360 2.70E-11 1.50E-07 5.29 1.87 948 246.93
2:00 NBPF15 MUC1351 6.48E-10 3.60E-06 5.79 0.78 7591 7070.41
3:00 MSTO1 MUC1367 4.76E-09 2.65E-05 -6.19 -0.81 761 729.59
4:00 HDAC1 MUC1350 1.34E-08 7.44E-05 -5.95 -0.78 2215 2121.49
5:00 DCAF6 MUC1374 6.26E-08 3.48E-04 -5.7 -0.61 2348 3084.29
6:00 NBPF16 MUC1351 2.19E-07 6.10E-04 4.85 0.68 4014 3844.74
  meanCorrected theta aberrant AberrantBySample AberrantByGene padj_rank
1:00 86.15 16.61 TRUE 1 1 1
2:00 4500.21 109.83 TRUE 2 1 1
3:00 1216.01 150.84 TRUE 1 1 1
4:00 3529.56 137.72 TRUE 1 1 1
5:00 4600.94 198.54 TRUE 1 1 1
6:00 2603.5 105.75 TRUE 2 1 2
c-mertes commented 1 year ago

Dear @J-Lye, thank you for reporting this difference. Under the hood, we use the CPU-optimized RcppArmadillo package (https://arma.sourceforge.net/, https://cran.r-project.org/web/packages/RcppArmadillo/index.html). As this is compiled using locally available CPU functionality, the OUTRIDER optimization can lead to minor rounding differences across different CPU architectures. But if you run it locally on the same CPU twice, the results should replicate as the code is deterministic but unfortunately not agnostic of the underlying hardware.

I hope this helped you understand your differences in the results.