Open torfjelde opened 1 year ago
I think now the CUDAext works properly, current tests about cuda all passes. The following code runs properly:
using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using NormalizingFlows
rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)
CUDA.functional()
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)
x = rand(rng, q0_g) # good
However, there is still issue to fix---sample multiple samples at once, and sample from Bijectors.TransformedDistribuition
. Minimal examples are as follows:
xs = rand(rng, q0_g, 10) # ambiguous
error message:
ERROR: MethodError: rand(::CUDA.RNG, ::MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ::Int64) is ambiguous.
Candidates: rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate, Continuous}, n::Int64) @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:23 rand(rng::Random.AbstractRNG, s::Sampleable{Multivariate}, n::Int64) @ Distributions ~/.julia/packages/Distributions/Ufrz2/src/multivariates.jl:21 rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous}, n::Int64) @ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:16
Possible fix, define rand(::CUDA.RNG, ::Sampleable{Multivariate, Continuous}, ::Int64)
Stacktrace: [1] top-level scope @ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:42
- sample from `Bijectors.TransformedDistribution`:
```julia
y = rand(rng, flow_g) # ambiguous
err meesage:
ERROR: MethodError: rand(::CUDA.RNG, ::MultivariateTransformed{MvNormal{Float32, PDMats.ScalMat{Float32}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, ComposedFunction{PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, PlanarLayer{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}) is ambiguous.
Candidates:
rand(rng::Random.AbstractRNG, td::MultivariateTransformed)
@ Bijectors ~/.julia/packages/Bijectors/cvMxj/src/transformed_distribution.jl:160
rand(rng::CUDA.RNG, s::Sampleable{<:ArrayLikeVariate, Continuous})
@ NormalizingFlowsCUDAExt ~/Research/Turing/NormalizingFlows.jl/ext/NormalizingFlowsCUDAExt.jl:7
Possible fix, define
rand(::CUDA.RNG, ::MultivariateTransformed)
Stacktrace:
[1] top-level scope
@ ~/Research/Turing/NormalizingFlows.jl/example/test.jl:40
This is partially because we are overloading methods and types that do not own by this pkg. Any thoughts about how to address this @torfjelde @sunxd3?
I don't have a immediate solution other than the suggested fixes.
It is indeed a bit annoying, maybe we don't dispatch on rng
?
It is indeed a bit annoying, maybe we don't dispatch on rng?
Yeah, I agree. For temporary solution, I'm thinking adding an additional argument for Distribution.rand
, something like device
to indicate on cpu or on gpu. But for long term fix, Im now leaning towards your previous attempts. Although this will require resolving some compatibility issue with Bijectors.
Honestly, IMO, the best solution right now is just to add our own rand
for now to avoid ambiguity errors.
If we want to properly support all of this, we'll have to go down the path of specializing the methods further, i.e. not do a Union
as we've done now, which will take time and effort.
For now, just make a NormalizingFlows.rand_device
or something, that just calls rand
by default, but which we can then overload to our liking without running into ambiguity-errors.
How does that sound?
For now, just make a NormalizingFlows.rand_device or something, that just calls rand by default, but which we can then overload to our liking without running into ambiguity-errors.
Yeah, after thinking about it, I agree that this is probably the best way to go at this point. Working on it now!
I have adapted the NF.rand_device()
approach. I think now we have a work around. The following code runs properly:
using CUDA
using LinearAlgebra
using Distributions, Random
using Bijectors
using Flux
import NormalizingFlows as NF
rng = CUDA.default_rng()
T = Float32
q0_g = MvNormal(CUDA.zeros(T, 2), I)
CUDA.functional()
ts = reduce(∘, [f32(Bijectors.PlanarLayer(2)) for _ in 1:2])
ts_g = gpu(ts)
flow_g = transformed(q0_g, ts_g)
@torfjelde @sunxd3 Let me know if this attempt looks good to you. If so, I'll update the docs.
It seems overloading an external package in an extension doesn't work (which is probably for the better), so atm the CUDA tests are failing.
But if we move the overloads into the main package, they run. So probably should do that from now on.