Open phipsgabler opened 4 years ago
I guess the problem is that the the samples in vi
are saved as Real
, and hence rerunning leads to the error since the evaluation of the logpdf in Bijectors calls eps
for Real
which is not defined. It seems reasonable that it doesn't affect the likelihood context since we don't evaluate the logpdf there. I guess this issue should be transferred to Bijectors, maybe it's possible to avoid eps
or define some _eps
instead that falls back to eps(Float64)
for Real
.
Hm, the funny thing is that eps
is only used with Dirichlet
: https://github.com/TuringLang/Bijectors.jl/blob/master/src/Bijectors.jl#L124.
I guess this is the reason:
julia> logpdf(Dirichlet(2, 1.0), [1, 0])
NaN
julia> logpdf(Dirichlet(2, 1.0), [1 + eps(Float64), eps(Float64)])
0.0
Probably either of the following should be fine:
julia> logpdf(Dirichlet(2, 1.0), nextfloat.(Real[1.0, 0.0]))
0.0
julia> logpdf(Dirichlet(2, 1.0), Real[1.0, 0.0] .+ eps.(Real[1.0, 0.0]))
0.0
(and eps
is defined internally through nextfloat
, so I'd prefer the first).
_eps
should be used here. I will make a PR.
https://github.com/TuringLang/Bijectors.jl/blob/1f3b581afe04f690bd93fba9edd88735cc1fc140/src/Bijectors.jl#L124
is actually not defined for x >= 1 - eps
mathematically even though it works with Distributions (and incorrect for any x > 0
). Maybe one should apply the same fix as in the SimplexBijector and rescale x
to x * (1 - 2 * eps) + eps
, which leads to values in [eps, 1-eps]
if x in [0, 1]
before and would be consistent with the calculation in SimplexBijector. Of course, that still doesn't work if x < 0
or x > 1
due to numerical issues, so probably the only numerically stable way would be to work with the logarithm of the unnormalized Gamma random variates instead and apply the softmax function later on if needed (e.g., for parameterizing a categorical distribution).
More generally speaking, I'm wondering if for sampling and optimization in the unconstrained space we could use a rand_trans
function that generates samples in the transformed space directly to avoid these issues altogether. E.g., there exist algorithms for sampling X
with exp(X) \sim Gamma(a, 1)
directly in log-space, which avoids the issue of getting zero values for small shape parameters a
. It could always fall back to sampling in the original space and applying the transformation afterwards, but a more sophisticated implementation could avoid numerical issues whenever possible.
With
SampleFromPrior
andDefaultContext
:The same thing does not happen with
LikelihoodContext
: