arviz-devs / arviz-plots

ArviZ modular plotting
https://arviz-plots.readthedocs.io
Apache License 2.0
2 stars 1 forks source link

Add histogram to plot_dist #24

Open aloctavodia opened 8 months ago

imperorrp commented 4 months ago

I can try working on this issue

aloctavodia commented 4 months ago

great! Feel free to ask questions if you have doubts.

imperorrp commented 4 months ago

Sure, thank you!

I had a closer look at the existing codebase thinking about how to approach this and did have some doubts about how to build this.

In the existing implementation in src/arviz-plots/plots/distplot.py, .map() is called on the PlotCollection instance to create the plot elements with the visual element functions line_xy() and ecdf_line() passed to it, right?

For an if kind == 'hist': condition, what visual element function should be passed- would a custom one have to be defined for histogram-ish data? The def map() function in the PlotCollection class only accepts one visual element function at a time, so a new visual element function for histogram plots? I suppose a combination of line_x() and line() could also work (horizontal and vertical lines), with some processing required to define intervals based on number of bins wanted.

I checked the old ArviZ implementation of histogram plots for distplot and it seems like ax.hist() is used for example in the matplotlib implementation and ax.quad() with bokeh.

And for number of bins, I suppose the plot_kwargs dict could be where users could define a user given number of bins and a preset number of bins set as default?

OriolAbril commented 4 months ago

You'd have to follow a pattern similar to the two existing kinds.

  1. Compute the histogram of the data
  2. use .map on the result to plot the histogram

In step 1 you'll have to call the histogram function on the data, with kwargs taken from stats_kwargs. As a starting point, you can use https://einstats.python.arviz.org/en/latest/api/generated/xarray_einstats.numba.histogram.html (xarray-einstats is a dependency of arviz-plots already and numba is too for now). After finishing some more urgent matters in arviz-plots I'll shift my efforts more to arviz-stats, hopefully while I work more on arviz-stats we can gather feedback on the current plots.

Then in step 2 call .map on a new visual element and in this particular case, this new visual element will have to call a new function in the backend module that would have to be added for both matplotlib and bokeh (probably bars and quad respectively). Here is where kwargs are taken from plot_kwargs.


Important: I have just opened a PR where I modify a bit plot_dist if you start working now you might have to deal with merge conflicts. They should not be a huge issue but it can be a bit annoying. I'll try to merge the PR tomorrow so you can work on plot_dist without worrying about anything else.

imperorrp commented 4 months ago

I'll start on this tonight. Will open a draft PR for this and look into xarray-einstats. Thanks for the suggestions. I'll let you know if I have any other questions!

imperorrp commented 4 months ago

Just got started on this- I just noticed there's no requirements.txt file for ArviZ-Plots. We are using Flit instead of pip for installing dependencies then?

Edit:

I tried using Flit install --symlink to get the dependencies installed, but I'm running into an issue with ArviZ-Stats for some reason:

INFO: pip is looking at multiple versions of arviz-stats to determine which version is compatible with other requirements. This could take a while.
ERROR: Package 'arviz-stats' requires a different Python: 3.9.2 not in '>=3.10'

I don't know why this happening. I checked the repo for ArviZ-Stats and saw the pyproject.toml file there has a requires-python = ">=3.10" near the beginning.

Is some past version of ArviZ-Stats being checked for some reason?

OriolAbril commented 4 months ago

pyproject is the new way of defining project metadata and configure how to build/install the project. You should still use pip to install the project though.

To install the local copy of your project:

pip install -e .
# then also install matplotlib and/or bokeh if you want to use it

You'll need to do this to use and try your changes out interactively.

But for development related tasks, only tox is needed as a dependency. Tox takes care of installing required dependencies, the local copy of the project and running the test/docs/lint commands. Test and doc ones are listed in https://arviz-plots.readthedocs.io/en/latest/contributing/testing.html and https://arviz-plots.readthedocs.io/en/latest/contributing/docs.html and there is also tox - e check which you should run after git add but before git commit

imperorrp commented 4 months ago

Okay, thank you. I will try these out.