Closed bvenn closed 2 years ago
The issue is more complex than I thought. While for monotonic pvalues the strategy works, but if many identical pvalues exist, the sorting corrupts the q value smoothing. If many identical keys exist (pvalues), it is not clear which index to choose.
Reproduce
#r "nuget: Plotly.NET, 2.0.0-preview.16"
open Plotly.NET
let index = Array.init 10000 id
let testValues =
[|
[|1. .. 5000.|]
Array.init 2000 (fun x-> 5000.)
[|5001..8000|]
|]
|> Array.concat
testValues |> Array.indexed |> Chart.Point |> Chart.show
System.Array.Sort(testValues,index)
index |> Array.indexed |> Chart.Point |> Chart.show
Edit: When Seq.sort
or List.sort
is used instead of Array.Sort
the problem seems to be solved.
The standard q value implementation is fixed. I decided to omit the bindBy
function, since it reduces the readability and causes harm when the p value collection is too large. The monotonization of the q values is now packed within the respective function. Unit tests must be corrected and the Qvalues.ofPvaluesRobust requires further inspection of validity and proper documentation.
The robust q value version has an additional term, that corrects small p values, especially when the number of tests is low. Its described in Storey, J.D. (2002), A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64: 479-498. https://doi.org/10.1111/1467-9868.00346
in function 9.
Describe the bug In
FSharp.Stats.Testing.Multiple.Qvalues
local FDRs are calculated and afterwards smoothed so that the q value of pi is the minimal FDR of all p values greater than pi.While the local FDR calculation is correct, the smoothing does not take the minimal FDR of pvals greater than pi, but the maximal FDR of pvals lower than pi, which makes the computation more conservative as it must be.
Solution Modify the bindby function accordingly.