Open maximerischard opened 4 years ago
@maximerischard @johnczito There is a PR here: https://github.com/JuliaMath/SpecialFunctions.jl/pull/242 Distributions.jl calls both StatsFuns.jl & SpecialFunctions.jl
Oh, what do you know. Thanks for the heads up. If that gets merged and released, we can close this and you can open the PR in Distributions
.
Looks like JuliaMath/SpecialFunctions.jl#242 flamed out due to licensing issues -- the code it was derived from was LGPL.
I can work on this next, basing it on Fast and Accurate Calculation of Owen's T Function, which won't have a license issue (The Journal of Statistical Software uses the Creative Commons License).
I'm working on getting the MVN cdf (qsmivnv.jl) over the finish line, and the Johnson Distributions for Distributions.jl right now. Next I can turn my attention to Owen's T.
@blackeneth that would be awesome! Looking forward to it!
@azev77 and others ...
History In the past 20 or so years, most implementations of Owen's T function have followed the algorithms given in "Fast and accurate Calculation of Owen's T-Function", by M. Patefield and D. Tandy, Journal of Statistical Software, 5 (5), 1 - 25 (2000)
Six algorithms were given, and which one used depended on the values of (h,a) in owent(h,a)
They developed code for these algorithms on a DEC VAX 750. The VAX 750 came out in 1980 and had a processor clock speed of 3.125 MHz.
The reason for 6 algorithms was to speed up the function when possible, with T1 being faster than T2, T2 faster than T3, etc.
TODAY
We have faster and better computers today.
If we skip T1 through T4, we can go straight to 48 point Gauss-Legendre quadrature for a <= 0.999999. For a > 0.999999, we can fall back to quadrature using QuadGK. This is sufficient for 15 digit accuracy for Float64. Gauss-Legendre quadrature takes about 3 microseconds, and QuadGK takes about 20 microseconds.
Questions: 1) Any objection to this? 2) do we need BigFloat support?
Code
I call this function "owensteasy(h,a)":
# owenteasy
# Written by Andy Gough; August 2021
# Rev 1.02
Using QuadGK
Using Distributions
function owenteasy(h::T1,a::T1; calc_method=nothing, m=nothing) where {T1 <: Real}
#*********************
# shortcut evaluations
#*********************
if h < zero(h)
return owenteasy(abs(h),a)
end
if h==zero(h)
return arctan(a)*inv2π
end
if a < zero(a)
return -owenteasy(h,abs(a))
end
if a==zero(a)
return zero(a)
end
if a==one(a)
return 0.125*erfc(-h*invsqrt2)*erfc(h*invsqrt2)
end
if a==Inf
return 0.25*erfc(sqrt(h^2)*invsqrt2)
end
# below reduces the range from -inf < h,a < +inf to h ≥ 0, 0 ≤ a ≤ 1
unitnorm = Normal()
if a > one(a)
return 0.5*cdf(unitnorm,h) + 0.5*cdf(unitnorm,a*h) - cdf(unitnorm,h)*cdf(unitnorm,a*h) - owenteasy(a*h,one(a)/a)
end
# calculate Owen's T
if a ≤ 0.999999
t2(h,a,x) = inv2π*(a/2)*exp(-0.5*(h^2)*(1+(a*x)^2))/(1+(a*x)^2)
x = [-0.9987710072524261, -0.9935301722663508, -0.9841245837228269, -0.9705915925462473, -0.9529877031604309, -0.9313866907065543, -0.9058791367155696
, -0.8765720202742479, -0.8435882616243935, -0.8070662040294426, -0.7671590325157404, -0.7240341309238146, -0.6778723796326639, -0.6288673967765136
, -0.5772247260839727, -0.5231609747222331, -0.4669029047509584, -0.4086864819907167, -0.34875588629216075, -0.28736248735545555, -0.22476379039468905
, -0.1612223560688917, -0.0970046992094627, -0.03238017096286937, 0.03238017096286937, 0.0970046992094627, 0.1612223560688917, 0.22476379039468905
, 0.28736248735545555, 0.34875588629216075, 0.4086864819907167, 0.4669029047509584, 0.5231609747222331, 0.5772247260839727, 0.6288673967765136
, 0.6778723796326639, 0.7240341309238146, 0.7671590325157404, 0.8070662040294426, 0.8435882616243935, 0.8765720202742479, 0.9058791367155696
, 0.9313866907065543, 0.9529877031604309, 0.9705915925462473, 0.9841245837228269, 0.9935301722663508, 0.9987710072524261]
w = [0.0031533460523059122, 0.0073275539012762885, 0.011477234579234613, 0.015579315722943824, 0.01961616045735561, 0.023570760839324363
, 0.027426509708356944, 0.031167227832798003, 0.03477722256477052, 0.038241351065830737, 0.04154508294346467, 0.0446745608566943, 0.04761665849249045
, 0.05035903555385445, 0.05289018948519363, 0.055199503699984116, 0.05727729210040322, 0.05911483969839564, 0.06070443916589387, 0.06203942315989268
, 0.06311419228625402, 0.06392423858464813, 0.06446616443594998, 0.06473769681268386, 0.06473769681268386, 0.06446616443594998, 0.06392423858464813
, 0.06311419228625402, 0.06203942315989268, 0.06070443916589387, 0.05911483969839564, 0.05727729210040322, 0.055199503699984116
, 0.05289018948519363, 0.05035903555385445, 0.04761665849249045, 0.0446745608566943, 0.04154508294346467, 0.038241351065830737, 0.03477722256477052
, 0.031167227832798003, 0.027426509708356944, 0.023570760839324363, 0.01961616045735561, 0.015579315722943824, 0.011477234579234613, 0.0073275539012762885
, 0.0031533460523059122]
TOwen = dot(w, t2.(h,a,x))
return TOwen
else
# a > 0.999999, use quadrature
tt(h,x) = inv2π*exp(-0.5*(h^2)*(1.0+x^2))/(1.0+x^2)
TOwen, err = quadgk(x -> tt(h,x),0.0,a, atol=eps())
return TOwen
end
end
You can test it with this code:
# owenteasy test values
using Random
using Test
hvec = [0.0625, 6.5, 7.0, 4.78125, 2.0, 1.0, 0.0625, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.25, 0.25, 0.25, 0.25, 0.125, 0.125, 0.125, 0.125, 0.0078125
, 0.0078125, 0.0078125, 0.0078125, 0.0078125, 0.0078125, 0.0625, 0.5, 0.9]
avec = [0.25, 0.4375, 0.96875, 0.0625, 0.5, 0.9999975, 0.999999125, 0.5, 1, 2, 3, 0.5, 1, 2, 3, 0.5, 1, 2, 3, 0.5, 1, 2, 3, 0.5, 1, 2, 3, 10, 100
, 0.999999999999999, 0.999999999999999, 0.999999999999999]
cvec = [big"0.0389119302347013668966224771378", big"2.00057730485083154100907167685e-11", big"6.399062719389853083219914429e-13"
, big"1.06329748046874638058307112826e-7", big"0.00862507798552150713113488319155", big"0.0667418089782285927715589822405"
, big"0.1246894855262192"
, big"0.04306469112078537", big"0.06674188216570097", big"0.0784681869930841", big"0.0792995047488726", big"0.06448860284750375", big"0.1066710629614485"
, big"0.1415806036539784", big"0.1510840430760184", big"0.07134663382271778", big"0.1201285306350883", big"0.1666128410939293", big"0.1847501847929859"
, big"0.07317273327500386", big"0.1237630544953746", big"0.1737438887583106", big"0.1951190307092811", big"0.07378938035365545"
, big"0.1249951430754052", big"0.1761984774738108", big"0.1987772386442824", big"0.2340886964802671", big"0.2479460829231492", big"0.1246895548850744"
, big"0.1066710629614484", big"0.0750909978020473015080760056431386286348318447478899039422181015"]
mvbck = "\033[1A\033[58G"
mva = "\033[72G"
mve = "\033[90G"
warmup = owenteasy(0.0625,0.025)
for i in 1:size(hvec,1)
if i == 1
println("\t\tExecution Time","\033[58G","h",mva,"a",mve,"error\t\t","log10 error")
end
h = hvec[i]
a = avec[i]
c = cvec[i]
print(i,"\t")
@time tx = owenteasy(h,a)
err = Float64(round(tx - c,sigdigits=4))
logerr = Float64(round(log10(abs(tx-c)),sigdigits=3))
println(mvbck,h,mva,a,mve,err,"\t",logerr)
end
Here are the results I get from the above testing:
Execution Time h a error log10 error
1 0.000013 seconds (4 allocations: 1.469 KiB) 0.0625 0.25 -9.243e-20 -19.0
2 0.000007 seconds (4 allocations: 1.469 KiB) 6.5 0.4375 -1.521e-27 -26.8
3 0.000003 seconds (4 allocations: 1.469 KiB) 7.0 0.96875 1.365e-28 -27.9
4 0.000003 seconds (4 allocations: 1.469 KiB) 4.78125 0.0625 -3.432e-23 -22.5
5 0.000003 seconds (4 allocations: 1.469 KiB) 2.0 0.5 -6.239e-19 -18.2
6 0.000003 seconds (4 allocations: 1.469 KiB) 1.0 0.9999975 -2.549e-17 -16.6
7 0.000040 seconds (78 allocations: 1.688 KiB) 0.0625 0.999999125 -4.573e-17 -16.3
8 0.000003 seconds (4 allocations: 1.469 KiB) 1.0 0.5 -1.39e-17 -16.9
9 0.000001 seconds (1 allocation: 16 bytes) 1.0 1.0 -1.467e-17 -16.8
10 0.000005 seconds (6 allocations: 1.500 KiB) 1.0 2.0 -1.749e-17 -16.8
11 0.000004 seconds (6 allocations: 1.500 KiB) 1.0 3.0 -4.593e-17 -16.3
12 0.000003 seconds (4 allocations: 1.469 KiB) 0.5 0.5 -1.017e-17 -17.0
13 0.000001 seconds (1 allocation: 16 bytes) 0.5 1.0 1.507e-17 -16.8
14 0.000005 seconds (6 allocations: 1.500 KiB) 0.5 2.0 -1.125e-16 -15.9
15 0.000003 seconds (6 allocations: 1.500 KiB) 0.5 3.0 -4.053e-17 -16.4
16 0.000003 seconds (4 allocations: 1.469 KiB) 0.25 0.5 -4.569e-18 -17.3
17 0.000001 seconds (1 allocation: 16 bytes) 0.25 1.0 -1.35e-18 -17.9
18 0.000003 seconds (6 allocations: 1.500 KiB) 0.25 2.0 1.313e-16 -15.9
19 0.000003 seconds (6 allocations: 1.500 KiB) 0.25 3.0 -4.461e-17 -16.4
20 0.000003 seconds (4 allocations: 1.469 KiB) 0.125 0.5 -1.717e-17 -16.8
21 0.000001 seconds (1 allocation: 16 bytes) 0.125 1.0 -2.796e-17 -16.6
22 0.000005 seconds (6 allocations: 1.500 KiB) 0.125 2.0 -6.153e-17 -16.2
23 0.000004 seconds (6 allocations: 1.500 KiB) 0.125 3.0 4.929e-18 -17.3
24 0.000003 seconds (4 allocations: 1.469 KiB) 0.0078125 0.5 3.781e-18 -17.4
25 0.000001 seconds (1 allocation: 16 bytes) 0.0078125 1.0 -1.026e-18 -18.0
26 0.000003 seconds (6 allocations: 1.500 KiB) 0.0078125 2.0 1.615e-17 -16.8
27 0.000003 seconds (6 allocations: 1.500 KiB) 0.0078125 3.0 2.795e-17 -16.6
28 0.000003 seconds (6 allocations: 1.500 KiB) 0.0078125 10.0 7.214e-17 -16.1
29 0.000003 seconds (6 allocations: 1.500 KiB) 0.0078125 100.0 -4.746e-17 -16.3
30 0.000027 seconds (78 allocations: 1.688 KiB) 0.0625 0.999999999999999 -4.486e-17 -16.3
31 0.000014 seconds (78 allocations: 1.688 KiB) 0.5 0.999999999999999 5.956e-17 -16.2
32 0.000013 seconds (78 allocations: 1.688 KiB) 0.9 0.999999999999999 -5.269e-18 -17.3
@johnczito @andreasnoack @mschauer @stevengj This would be useful for Distributions.jl Including for the skewed Normal/T that I PRed…
Whether using generic quadrature schemes is "sufficient" depends on the use case. It is typically orders of magnitude slower than specialized expansions like continued fractions, so you will probably be losing compared to other languages.
That being said, I agree that the optimal choice of algorithm these days is probably different than that old paper.
Does it make sense to discuss this in the larger context of the package? Are our other implementations typically "as fast as reasonably can be" or is there a continuous work on adding correct but perhaps not optimal implementations and later performance improvements?
@mschauer I agree. It looks like @blackeneth's implementation is already better than the current stuff that is used. Users can refer back here to Steven's comments & think about further optimizations in the future.
True, but it's also nice to avoid "embarrassingly slow compared to R or Python", and it would be good to check whether the quadrature approach falls into this category.
After some micro-optimizations, the speed for calculating Owen's T when a ≤ 0.999999 is about 1 μs:
When 0.999999 < a < 1.0, QuadGK is called and the execution time is about 7.5 μs:
Worst-case error is about 1e-16:
With some additional type stability work and test development, we should be good to go.
Note the function will have dependencies on:
As discussed in https://github.com/JuliaStats/StatsFuns.jl/pull/114, we should not add a circular dependency on Distributions to StatsFuns. It's seems it is not actually needed and you could just call normcdf
etc.
I am also worried that a dependency on QuadGK will make the package slower to load. In the past this has been a reason to not merge PRs (even though I think https://github.com/JuliaStats/StatsFuns.jl/pull/106 should be fine 😛 - it only adds a lightweight dependency and IMO the timings are increased only to a small extent).
@blackeneth can you compare your functions performance w implementations in other languages?
I can replace the Distributions.jl call to Normal() with normcdf(x) = 0.5*erfc(-x*invsqrt2)
As for R & Python, I can't say I'm an expert at benchmarking those, but my initial checks have R at 4.74 μs and Python at 2.5 μs
I will look at using "T6" for 0.999999 < a < 1.0, which could eliminate the QuadGK dependency.
I can replace the Distributions.jl call to Normal() with
normcdf(x) = 0.5*erfc(-x*invsqrt2)
That's precisely what I was referring to since normcdf
is already defined in StatsFuns (https://github.com/JuliaStats/StatsFuns.jl/blob/36e48bbbac12355906438ce1993736757a0c6568/src/distrs/norm.jl#L45).
Using method "T6" for a > 0.999999 has similar accuracy to using QuadGK:
# time h a error log10(abs(error))
30 0.000001 seconds (1 allocation: 16 bytes) 0.0625 0.999999999999999 1.362e-18 -17.9
31 0.000001 seconds (1 allocation: 16 bytes) 0.5 0.999999999999999 5.242e-18 -17.3
32 0.000001 seconds (1 allocation: 16 bytes) 0.9 0.999999999999999 8.609e-18 -17.1
33 0.000002 seconds (1 allocation: 16 bytes) 2.5 0.999999999999999 1.22e-17 -16.9
34 0.000002 seconds (1 allocation: 16 bytes) 7.33 0.999999999999999 2.688e-17 -16.6
35 0.000001 seconds (1 allocation: 16 bytes) 0.6 0.999999999999999 -1.346e-17 -16.9
36 0.000002 seconds (1 allocation: 16 bytes) 1.6 0.999999999999999 -5.124e-17 -16.3
37 0.000002 seconds (1 allocation: 16 bytes) 2.33 0.999999999999999 1.616e-18 -17.8
38 0.000007 seconds (4 allocations: 1.469 KiB) 2.33 0.99999 -2.807e-18 -17.6
The performance is quite good for 0.999999 < a < 1.0
Some additional checks for type stability, then a few more tests, and it should be good for a PR.
The function has been ready for some time. The barrier is git. My VS Code is somehow setup wrong, so it wants to release the wrong files. I rarely use git, so I forget how to use it after every use. But now I'm moving, so won't have time to figure it out and release it for quite a while.
Attached is the code for the function, owent(h,a). Also attached is the code for the tests.
If someone wants to take them and release them, please do so.
Owen's T function currently does not seem to have a native julia implementation. Most existing implementations in Fortran or C are under GPL, which makes this a little tricky.
This would be useful for the Skew Normal distribution in Distributions.jl.