JuliaDiff / ForwardDiff.jl

Forward Mode Automatic Differentiation for Julia
Other
888 stars 141 forks source link

Inconsistent random numbers with some RNGs #593

Open marius311 opened 2 years ago

marius311 commented 2 years ago

A MWE:

julia> using Random, ForwardDiff

julia> N = 8;

julia> x = Vector{ForwardDiff.Dual{Nothing,Float64,1}}(undef, N);

julia> randn!(Xoshiro(1), x);

julia> x[end].value
-0.8260207919192974

julia> x = Vector{Float64}(undef, N);

julia> randn!(Xoshiro(1), x);

julia> x[end]
0.8653808054093252

MersenneTwister breaks at N=13. StableRNGs seems fine always. This is because some result-changing optimizations done on Float64 arrays here which don't happen for Dual arrays.

I'm trying to think of what the best solution is, but in the meantime wanted to file this and maybe get some ideas as well.

andreasnoack commented 2 years ago

I'm not sure if this is an issue. I'd think that the reproducibility is only for a fixed element type.

marius311 commented 2 years ago

It makes ForwardDiff the odd one out in being able to consistently derive w.r.t parameters of some statistical models (eg in Turing, where I hit this), and I do think it would be good for this package to be able to:

julia> using Random, AbstractDifferentiation, Zygote, ForwardDiff, FiniteDifferences, Distributions, DistributionsAD, LinearAlgebra

julia> test(x) = sum(rand(Xoshiro(1), MvNormal(zeros(8), x*I)))
test (generic function with 1 method)

julia> test(1.)
0.34172342435510183

julia> AD.value_and_derivative(AD.ZygoteBackend(),            test, 1.)
(0.34172342435510183, (0.17086171217755092,))

julia> AD.value_and_derivative(AD.FiniteDifferencesBackend(), test, 1.)
(0.34172342435510183, (0.17086171217746912,))

julia> AD.value_and_derivative(AD.ForwardDiffBackend(),       test, 1.)
(4.4673840155712305, (2.2336920077856153,))
andreasnoack commented 2 years ago

Do you really want to differentiate rand? I guess you are aware but generally I don't think sampling algorithms give any guarantees about differentiability unless you explicitly sample via the quantile function.

In any case, I don't think there is much that can be done in this package. Essentially you are arguing that the optimization that you link to shouldn't be there so I think the issue should be filed against the main repo if you think it needs fixing.

marius311 commented 2 years ago

Maybe consider it this way, my test(x) function above is a deterministic smooth function of x which other AD libraries differentiate just fine, except ForwardDiff.

This type of differentation through a function which (internally) involves a rand arises at least in an inference algorithm I'm working on (MUSE), probably other places. Paper in prep for the thing that uses this in MUSE, but you can see typical code eg here (that's the Jax implementation btw, which also handles such a derivative involving a rand just fine, fwiw).

I also thought about if this is rather a "bug" in Random, but I think Random should be allowed to do whatever if it wants, its up to AD packages to differentiate its code correctly. The following code would fix ForwardDiff:

function Random.randn!(rng::AbstractRNG, A::Array{<:ForwardDiff.Dual})
    A .= randn!(rng, ForwardDiff.value.(A))
end

but it allocates a new array so maybe its not ideal. I've been trying to think of a way to do it without the allocation.

andreasnoack commented 2 years ago

my test(x) function above is a deterministic smooth function of x

It happens to be the case and it would be a bit odd if this wasn't the case for a scale (-like) parameter but it's not currently a requirement for rand. You could probably construct a Gaussian sampler that would be be nowhere differentiable in the standard deviation and still it would be a perfectly fine randn. Maybe it's something to discuss in a Distributions issue. I'd guess that your theoretical framework here assumes that the variates are differentiable in the parameters.

In practice, most samplers are probably differentiable except for a few points but e.g.

julia> αs = range(0.462, stop=0.464, length=10001);

julia> xs = [rand(Xoshiro(4), Distributions.GammaGSSampler(Gamma(α))) for α ∈ αs];

julia> lines(αs, xs)

illustrates the issue

Skærmbillede 2022-08-18 kl  14 16 17

A non-allocating version of your randn! could maybe be something like

function Random.randn!(rng::Random.AbstractRNG, a::Array{<:ForwardDiff.Dual{<:Any,Float64,N}}) where N
    N1 = N + 1
    pa = convert(Ptr{Float64}, pointer(a))
    GC.@preserve a begin
        randn!(rng, unsafe_wrap(Array, pa, length(a)))
        af64 = unsafe_wrap(Array, pa, length(a)*N1)
        for i in length(a):-1:1
            af64[(i - 1)*N1 + 1] = af64[i]
            for j in ((i - 1)*N1 + 2):(i*N1)
                af64[j] = 0
            end
        end
    end
    return a
end

but somebody should probably try to write the loop more efficiently.

andreasnoack commented 2 years ago

A pathological example for the fun of it

myrand(rng::Xoshiro, x::Normal) =
    isodd(reinterpret(UInt64, x.σ)) ? rand(rng, x) : (rand(rng); rand(rng, x))

would still satisfy all our current requirements for a rand but

σs = range(1.0, stop=2, length=1000)
xs = [myrand(Xoshiro(1), Normal(0.0, σ)) for σ in σs]
lines(σs, xs)
Skærmbillede 2022-08-18 kl  14 45 50

which doesn't affect the "pointwise" validity of it

julia> using RNGTest

julia> r = Xoshiro(1)
Xoshiro(0xfff0241072ddab67, 0xc53bc12f4c3f0b4e, 0x56d451780b2dd4ba, 0x50a4aa153d208dd8)

julia> RNGTest.smallcrushTestU01(() -> cdf(Normal(), myrand(r, Normal())))

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        
 Number of statistics:  15
 Total CPU time:   00:00:11.25

 All tests were passed
marius311 commented 2 years ago

Many thanks, those are super helpful examples. I see what you mean now, I wasn't familiar with these types of features in random samplers, but now it makes sense.

As an aside, in my particular application this is fine because I'm computing averaged quantities, more equivalent to:

xs = [mean(rand(Xoshiro(4), Distributions.GammaGSSampler(Gamma(α)), 100)) for α ∈ αs];

Ofc at that point the issue I brought up here itself doesn't really matter, which I think is kind of your point.

But I still think it would be nice and maybe save some headaches if ForwardDiff tried to give the "same-realization" derivative (like the other AD libs do). At the very least, the value that comes out of value_and_derivative really seems to me like it should be same as if I ran the function without ForwardDiff, and that's currently not the case here. All this with an understanding that the function may well only be piecewise continuous (or technically completely pathological as in your second example). Anyway, if you agree with that, I can see if I can improve your loops at all and submit a PR.

andreasnoack commented 2 years ago

Yeah. I can see that so I'd be fine with adding such a method as long as it has a comment with a reference to the method in Random necessitates the definition here.