Open gottacatchenall opened 1 year ago
Seems like it makes sense to use JS distance. The only issue I'm running into is whencomputing JS-div by definition as
$$JS(P,Q) = \frac{1}{2}KL(P,M) + \frac{1}{2}KL(Q,M)$$ where $M=\frac{1}{2}(P+Q)$
when using a MixtureModel
from Distributions.jl
for M
, this works fine for Normal
distributions but for MvNormal
s the methods within kldivergence
calls sample expectation values (presumably because it can't be computed analytically in general), so there is variance in JS measures on the same pair of distributions. With enough samples it goes down ofc, but there is going to be a trade-off in terms of speed of eval. Around $10^5$ samples is relatively stable for 5 layers, but this could be variable depending on the input number of layers.
Also I think it may be worth importing SA/other methods from Optim.jl instead of rewriting tools to do fit diagnostics from scratch, I'm going to take a closer look at Optim in a bit
I like the idea - this is what Fauxcurrence uses as well. One thing that may be useful is to use Jensen-Shannon instead, which is symmetrical, and bounded (to 1 when using log2, to log2 when using log2). The square root of JS is also a distance (that is bounded to 1 when using log2, which is really nice in terms of giving a sense of the quality of the fit).
Thoughts?