Open deveshjawla opened 1 year ago
More information would be helpful: Are you using Zygote
as the backend?
More information would be helpful: Are you using
Zygote
as the backend?
Hi, Thanks for responding. For MCMC, ReverseDiff. for VI, ForwardDiff. What do you think?
Can you try Zygote
and see if the discrepancy is smaller? Flux
tends to be optimized towards Zygote
so..
Can you try
Zygote
and see if the discrepancy is smaller?Flux
tends to be optimized towardsZygote
so..
No, the Zygote has an issue with the for loop in the @model
and if I try to use LazyArray then it Errors for Categorical.
MethodError: no method matching LazyArray(::Vector{Categorical{Float32, Vector{Float32}}})
There is a significant difference in the time it takes to perform inference using the reconstruct method below versus defining the function model i.e. the neural network using a function feedforward() as below.
I'd wager there's a type-instability introduced by Flux.destructure
then, while in your manually implemented feedforward
this is not the case.
You can check this by doing:
@code_warnype reconstruct(θ)
@code_warnype feedforward(θ)
Note that it might even be that reconstruct
is type-stable, but that the resulting backwards-pass defined by ReverseDiff makes it unstable (there are ways to inspect this too, but for the moment check the above).
I'd wager there's a type-instability introduced by
Flux.destructure
then, while in your manually implementedfeedforward
this is not the case.
When using restructure the Chain parameters are Float32 but the θ
is Float64. I tried to explicitly define the Prior θ
as Float32 Normal distributions, but it errored.
A snippet from @code_warntype
for reconstruct
is as follows, Although there are not red marked types in its output:
MethodInstance for (::Optimisers.Restructure{Chain{Tuple{Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, typeof(softmax)}}, NamedTuple{(:layers,), Tuple{Tuple{NamedTuple{(:weight, :bias, :σ), Tuple{Int64, Int64, Tuple{}}}, NamedTuple{(:weight, :bias, :σ), Tuple{Int64, Int64, Tuple{}}}, NamedTuple{(:weight, :bias, :σ), Tuple{Int64, Int64, Tuple{}}}, Tuple{}}}}})(::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
Similarly using @code_warntype
in the case of using a feedforward(θ)
similarly has no red marked types but the Chain is Float64:
From worker 3: from feedforward(θ::AbstractVector) in Main at /Users/456828/Projects/Bayesian-Active-Learning/DataSets/coalminequakes_dataset/Network.jl:8
From worker 3: Arguments
From worker 3: #self#::Core.Const(feedforward)
From worker 3: θ::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}
From worker 3: Locals
From worker 2: │ %16 = Main.Dense(W0, b0, Main.relu)::Dense{typeof(relu), ReverseDiff.TrackedArray{Float64, Float64, 2, Matrix{Float64}, Matrix{Float64}}, Vector{ReverseDiff.TrackedReal{Float64, Float64, ReverseDiff.TrackedArray{Float64, Float64, 2, Matrix{Float64}, Matrix{Float64}}}}}```
When using restructure the Chain parameters are Float32 but the θ is Float64.
The Float32
are a Flux-specific thing, they work mainly with Float32
and even started to enforce this more generally recently.
There is a significant difference in the time it takes to perform inference using the reconstruct method below versus defining the function model i.e. the neural network using a function feedforward() as below. For a very simple and small problem like the IRIS dataset, the difference in efficiency when using the function method the inference happens in under one minute, and when using the reconstruct method then it takes 30 minutes. Any ideas why this happens and how to make the reconstruct method comparable?
where
parameters_initial, reconstruct = Flux.destructure(nn_initial)
As compared to the below
where