Add SoftPlus bijection, and use it as the default way to enforce positivity of scale parameters in Affine and TriangularAffine. Note this also will affect most distributions, as loc, scale distributions are implemented as Affine transformations of the standardised version.
This change was made after experiencing instability, particularly when affine transformations were parameterised by neural networks (e.g. in masked autoregressive flows and coupling flows), where the positive scale parameter can explode. See also the discussion here https://github.com/pyro-ppl/numpyro/issues/855.
Note this will introduce some breaking changes:
Any distributions or flows that use Affine will optimise differently. This includes all loc, scale distributions, as well as CouplingFlow/MaskedAutoregressiveFlow with an Affine transformer, and the TriangularSplineFlow.
Some attribute names have changed for clarity, now we do not rely on log/exp for positivity constraints.:
log_scale -> _scale in Affine
log_diag -> _diag, and weight_log_scale -> _weight_scale in TriangularAffine
Add
SoftPlus
bijection, and use it as the default way to enforce positivity of scale parameters inAffine
andTriangularAffine
. Note this also will affect most distributions, as loc, scale distributions are implemented asAffine
transformations of the standardised version.This change was made after experiencing instability, particularly when affine transformations were parameterised by neural networks (e.g. in masked autoregressive flows and coupling flows), where the positive scale parameter can explode. See also the discussion here https://github.com/pyro-ppl/numpyro/issues/855.
Note this will introduce some breaking changes:
Affine
will optimise differently. This includes all loc, scale distributions, as well asCouplingFlow
/MaskedAutoregressiveFlow
with anAffine
transformer, and theTriangularSplineFlow
.log_scale
->_scale
inAffine
log_diag
->_diag
, andweight_log_scale
->_weight_scale
inTriangularAffine