Open ForceBru opened 3 years ago
Personally, I don't like the default to plotting the individual components of the mixture model either for exactly the reasons you've given. But I think we would need to look into why those decisions were made before changing the default.
The recipe for MixtureModel
was added in #246. Then or since, it seems there has been no discussion about the components
default argument. I agree the default should be components=false
. @mkborregaard or @BeastyBlacksmith if we do fix this, would that be considered a breaking change?
There is room for interpretation here, but I'd say that changing defaults is a breaking change. But since StatsPlots is pre 1.0 making these changes is fine. I'd think about other breaking changes you might want to do and do them in one batch though.
Revisiting this, I think the current behavior of plotting all components separately with their own styles makes sense if one has a mixture of discrete and continuous distributions (currently not allowed by Distributions but will be in the future, and censored distributions are examples of this). So e.g. one could plot a Censored(Normal(), -1, 1)
with lines within the interval and sticks at the bounds. But then if one constructs a mixture whose components have discrete components, one can't programmatically detect this.
One thing we could do is grab the default_range
s for all mixture components, augment the ranges from discrete distributions with nextfloat
and prevfloat
, then interleave all points, remove duplicates, and evaluate the mixture density. This would show discrete atoms as vertical lines and due to recursion would work for mixtures of mixtures.
TL;DR: currently PDFs of the components seem to be plotted without considering their weights (that the code calls "prior"). Maybe it would be better to plot them weighted, or maybe plot the PDF of the mixture (instead of plotting PDFs of its components) with an option to add PDFs of the weighted components to the plot?
The pull request https://github.com/JuliaPlots/StatsPlots.jl/pull/456 introduced plotting of
UnivariateGMM
, so I was trying to use it to see how well my model describes the data. However, the fit looked terrible (the PDFs of the components were way too spiky) despite convergence criteria and the log-likelihood showing that the fit wasn't that bad. I plotted the data from the pull request's sample code and found that the PDFs of the components are not weighted:I think the PDF on the left doesn't look anything like the leftmost "bump" of the histogram, so it looks like the mixture model fits the data poorly, yet we know that the data were literally sampled from this exact mixture. I would expect the plots of the two PDFs to sort of "hug" the histogram, like this:
Of course, there's overlap in the middle that individual components can't explain that well, but the full PDF of the mixture can:
So, maybe plot the PDF of the mixture and, optionally, the weighted components' PDFs?