Open marius311 opened 2 years ago
I'm not sure if this is an issue. I'd think that the reproducibility is only for a fixed element type.
It makes ForwardDiff the odd one out in being able to consistently derive w.r.t parameters of some statistical models (eg in Turing, where I hit this), and I do think it would be good for this package to be able to:
julia> using Random, AbstractDifferentiation, Zygote, ForwardDiff, FiniteDifferences, Distributions, DistributionsAD, LinearAlgebra
julia> test(x) = sum(rand(Xoshiro(1), MvNormal(zeros(8), x*I)))
test (generic function with 1 method)
julia> test(1.)
0.34172342435510183
julia> AD.value_and_derivative(AD.ZygoteBackend(), test, 1.)
(0.34172342435510183, (0.17086171217755092,))
julia> AD.value_and_derivative(AD.FiniteDifferencesBackend(), test, 1.)
(0.34172342435510183, (0.17086171217746912,))
julia> AD.value_and_derivative(AD.ForwardDiffBackend(), test, 1.)
(4.4673840155712305, (2.2336920077856153,))
Do you really want to differentiate rand
? I guess you are aware but generally I don't think sampling algorithms give any guarantees about differentiability unless you explicitly sample via the quantile function.
In any case, I don't think there is much that can be done in this package. Essentially you are arguing that the optimization that you link to shouldn't be there so I think the issue should be filed against the main repo if you think it needs fixing.
Maybe consider it this way, my test(x)
function above is a deterministic smooth function of x
which other AD libraries differentiate just fine, except ForwardDiff.
This type of differentation through a function which (internally) involves a rand arises at least in an inference algorithm I'm working on (MUSE), probably other places. Paper in prep for the thing that uses this in MUSE, but you can see typical code eg here (that's the Jax implementation btw, which also handles such a derivative involving a rand
just fine, fwiw).
I also thought about if this is rather a "bug" in Random, but I think Random should be allowed to do whatever if it wants, its up to AD packages to differentiate its code correctly. The following code would fix ForwardDiff:
function Random.randn!(rng::AbstractRNG, A::Array{<:ForwardDiff.Dual})
A .= randn!(rng, ForwardDiff.value.(A))
end
but it allocates a new array so maybe its not ideal. I've been trying to think of a way to do it without the allocation.
my
test(x)
function above is a deterministic smooth function ofx
It happens to be the case and it would be a bit odd if this wasn't the case for a scale (-like) parameter but it's not currently a requirement for rand
. You could probably construct a Gaussian sampler that would be be nowhere differentiable in the standard deviation and still it would be a perfectly fine randn
. Maybe it's something to discuss in a Distributions issue. I'd guess that your theoretical framework here assumes that the variates are differentiable in the parameters.
In practice, most samplers are probably differentiable except for a few points but e.g.
julia> αs = range(0.462, stop=0.464, length=10001);
julia> xs = [rand(Xoshiro(4), Distributions.GammaGSSampler(Gamma(α))) for α ∈ αs];
julia> lines(αs, xs)
illustrates the issue
A non-allocating version of your randn!
could maybe be something like
function Random.randn!(rng::Random.AbstractRNG, a::Array{<:ForwardDiff.Dual{<:Any,Float64,N}}) where N
N1 = N + 1
pa = convert(Ptr{Float64}, pointer(a))
GC.@preserve a begin
randn!(rng, unsafe_wrap(Array, pa, length(a)))
af64 = unsafe_wrap(Array, pa, length(a)*N1)
for i in length(a):-1:1
af64[(i - 1)*N1 + 1] = af64[i]
for j in ((i - 1)*N1 + 2):(i*N1)
af64[j] = 0
end
end
end
return a
end
but somebody should probably try to write the loop more efficiently.
A pathological example for the fun of it
myrand(rng::Xoshiro, x::Normal) =
isodd(reinterpret(UInt64, x.σ)) ? rand(rng, x) : (rand(rng); rand(rng, x))
would still satisfy all our current requirements for a rand
but
σs = range(1.0, stop=2, length=1000)
xs = [myrand(Xoshiro(1), Normal(0.0, σ)) for σ in σs]
lines(σs, xs)
which doesn't affect the "pointwise" validity of it
julia> using RNGTest
julia> r = Xoshiro(1)
Xoshiro(0xfff0241072ddab67, 0xc53bc12f4c3f0b4e, 0x56d451780b2dd4ba, 0x50a4aa153d208dd8)
julia> RNGTest.smallcrushTestU01(() -> cdf(Normal(), myrand(r, Normal())))
========= Summary results of SmallCrush =========
Version: TestU01 1.2.3
Generator:
Number of statistics: 15
Total CPU time: 00:00:11.25
All tests were passed
Many thanks, those are super helpful examples. I see what you mean now, I wasn't familiar with these types of features in random samplers, but now it makes sense.
As an aside, in my particular application this is fine because I'm computing averaged quantities, more equivalent to:
xs = [mean(rand(Xoshiro(4), Distributions.GammaGSSampler(Gamma(α)), 100)) for α ∈ αs];
Ofc at that point the issue I brought up here itself doesn't really matter, which I think is kind of your point.
But I still think it would be nice and maybe save some headaches if ForwardDiff tried to give the "same-realization" derivative (like the other AD libs do). At the very least, the value that comes out of value_and_derivative
really seems to me like it should be same as if I ran the function without ForwardDiff, and that's currently not the case here. All this with an understanding that the function may well only be piecewise continuous (or technically completely pathological as in your second example). Anyway, if you agree with that, I can see if I can improve your loops at all and submit a PR.
Yeah. I can see that so I'd be fine with adding such a method as long as it has a comment with a reference to the method in Random necessitates the definition here.
A MWE:
MersenneTwister breaks at N=13. StableRNGs seems fine always. This is because some result-changing optimizations done on Float64 arrays here which don't happen for Dual arrays.
I'm trying to think of what the best solution is, but in the meantime wanted to file this and maybe get some ideas as well.