Open cpmpercussion opened 5 years ago
it seems to me that the scale vector should have been squared before using as a covariance matrix, so this is now the current behaviour.
It remains to write a test (going across tensorflow probability and numpy) that a tfd scale vector is actually going to produce the correct distributions.
There could be an issue with sampling due to (my) confusion about standard deviation and variance.
The samples are drawn using numpy like so (documentation) (line 238 of
__init__.py
)But the output from the mixture density layer are treated as
scale
variables intfp.distributions.MultivariateNormalDiag
. This notes that:Thus, it seems we should have been squaring the cov_matrix before putting it into the multivariate normal sampling procedure. This could explain why we end up having to scale down the sigma variable so much in real-world applications.
A todo here is to get a definite answer and do some test to try out what's going on.