Closed imperorrp closed 2 weeks ago
@OriolAbril I had to dig a bit deeper into understanding how the plotting actually works as coded, but I think I've managed to get step 1 (computing the histogram) roughly done- I'll probably have to combine these DataArrays into a Dataset and then create the visual element for mapping. Also, I'll remove the print
statements I've added in later commits- just added them to help with seeing the outputs. Is something like this alright?- Looping through the data variables in distribution
to get DataArrays and then passing that plus the dimensions to be reduced to the xarray_einstats.histogram() function.
Also, I was going through the flow of how density was computed for KDE as an example (in Arviz-Stats). I saw bandwidth computing was done for KDE- but that won't be necessary for plotting histograms, right?
Combined the xarray_einstats.numba.histogram
produced DataArrays into a Dataset:
(This is with the Schools dataset)
It's in the style of the KDE dataset returned by distribution.azstats.kde
to density
Changed the plot_axis coords to 'x' and 'y' in place of 'bin_midpoint' and 'bin_height':
Also added a visual element function, backend interface and matplotlib implementation with matplotlib.pyplot.bar
:
The bar widths are different due to the different scales of the x-axis. I'm not sure how this can be fixed yet if keeping the width a consistent value is desired though. Also, the number of bins is set to 20 by default. Some bins have too few values to be visible as their respective bars are too low in height compared to the taller ones.
It is looking very good already. The computation of the histogram is a bit trickier and would not be part of the gsoc project, but I think it will be helpful to get you more familiar with xarray.
I have tried to focus on the plotting and contributing workflow with the comments. Let me know if you have any doubts related to testing.
Thank you, I will take a look at your reviews and advice!
@OriolAbril Latest commit has changes incorporating advice from your comments. Modified data restructure utility function to keep left and right edges info now until the matplotlib backend calculates midpoints from these at the end to create the bars, and I've updated the docstring and color/facecolor/edgecolor interface
By the way, I commented on the ess and mcse issues but forgot to comment here. There is a histogram function available in arviz-stats
now. I think the output will have a similar (if not the same) shape to what you are using
By the way, I commented on the ess and mcse issues but forgot to comment here. There is a histogram function available in
arviz-stats
now. I think the output will have a similar (if not the same) shape to what you are using
Altered the code to make use of this function. Also, it seems like point estimate and credible intervals are being plotted so close to the x axis for histogram plots because the default 'y' aesthetic value/s returned by plot collection are pretty low in relation. The maximum y-axis values for the KDE/ECDF plots are way lower so it's not a problem for them. I initially thought there was some issue with this aesthetic mapping when type hist is picked, but it works fine- it's just really small values compared to the histogram heights. Should we mention this in the documentation, asking users to define custom 'y' aesthetic values (in pc_kwargs
)when using kind=histogram for better visibility?
For the plot_dist example where one subplot is created for each variable:
azp.plot_dist(
data,
kind="kde",
pc_kwargs={
"cols": ["__variable__"],
"aes": {"color": ["school"], "y": ["school"]},
"y": np.linspace(0, 0.06, 9),
},
aes_map={
"kde": ["color"],
"point_estimate": ["color", "y"],
"credible_interval": ["y"],
},
)
plt.savefig("distplot-kde-cols-matplotlib.png")
Point estimates when switching from kind="kde" to kind="hist":
azp.plot_dist(
data,
kind="hist",
pc_kwargs={
"cols": ["__variable__"],
"aes": {"color": ["school"], "y": ["school"]},
"y": np.linspace(0, 0.06, 9),
},
aes_map={
"kde": ["color"],
"point_estimate": ["color", "y"],
"credible_interval": ["y"],
},
)
plt.savefig("distplot-hist-cols-matplotlib.png")
This is easily fixable with editing the np.linspace
line though-
"y": np.linspace(0, 600, 9),
We could maybe add this in the documentation so users know to keep this in mind.
Just pushed changes per last review
Attention: Patch coverage is 57.81250%
with 27 lines
in your changes missing coverage. Please review.
Project coverage is 84.84%. Comparing base (
1298e42
) to head (2304364
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This will need to be rebased and double checked to make it compatible with plotly. Take care of these last two comments left and after that I'll rebase and add more commits to the PR so it works with plotly
This PR aims to add support for histograms in
distplot.py
(Issue #24 ), allowing marginal densities to be visualized in this new form, apart from preexisting KDE and ECDF forms📚 Documentation preview 📚: https://arviz-plots--47.org.readthedocs.build/en/47/