arviz-devs / ArviZ.jl

Exploratory analysis of Bayesian models with Julia
https://julia.arviz.org
Other
103 stars 10 forks source link

Hook into Plots.jl and/or Makie.jl #108

Open sethaxen opened 3 years ago

sethaxen commented 3 years ago

Currently ArviZ.jl only supports ArviZ's matplotlib backend (using PyPlot.jl) and partially supports its Bokeh backend. ArviZ.jl should hook into Plots.jl for several reasons:

Makie.jl is a newer package that offers highly performant, interactive plots. Both packages use a recipe system for designing new plots. Plots.jl recipes can be consumed by Makie.jl, but the interactivity/reactivity is lost.

There are a few ways we could go about doing this.

  1. Add a new "backend" to ArviZ (the Python package) that, instead of plotting, just returns all data necessary to create a plot. We would then define recipes that always call the Python plotting function with this backend and then implement the same plot in the Plots API.
  2. Reimplement the plots from scratch, calling the necessary internal functions in the ArviZ Python package or Julia equivalents to perform the necessary analyses.

Both of these options have a significant drawback: requiring more maintenance time for this package.

The first option requires a lot of work upfront on the Python side, but very little on the Julia side. However, it will ultimately require more work to maintain on the Julia side, as changes to Python ArviZ can change the data returned and potentially break the plot for Julia. And unlike with Julia package dependencies, we cannot use version bounds to enforce that only mutually compatible versions of the Julia and Python packages are installed. This has made keeping the packages in sync challenging.

The second option requires more work upfront but will ultimately be easier to maintain. However, as new plotting options are added to ArviZ, work would be needed to support them over here, and gradually the two packages could become out-of-sync in the features they support.

For now I think the way forward is to experiment with the second option for very simple plots and see if the same component recipes can be trivially combined to construct the more elaborate plots. I'm interested in any alternative suggestions though.

ColCarroll commented 3 years ago

Sounds great! Maybe worth updating https://github.com/arviz-devs/arviz/wiki/ArviZ-2021-roadmap to keep track of your thoughts?

I like the idea of arviz's non-plot code providing a sufficient API to (easily) reproduce the plots: seems like that could enforce modularity and separation of concerns!

ahartikainen commented 3 years ago

At this step, should we clean the plotting data -> back-end interface? Currently they are more or less hacked together?

sethaxen commented 3 years ago

Sounds great! Maybe worth updating https://github.com/arviz-devs/arviz/wiki/ArviZ-2021-roadmap to keep track of your thoughts?

Good idea! ~Will do.~ Done!

I like the idea of arviz's non-plot code providing a sufficient API to (easily) reproduce the plots: seems like that could enforce modularity and separation of concerns!

Yes I agree this will be a nice stress test of the modularity of arviz. I'd prefer to avoid dipping into non-API functions if possible and instead reproduce functionality here to avoid annoying breakage and/or difficult maintenance. It's actually preferable to me to have pure Julia implementations of functions if simple to do so here or use Julia equivalents if they exist than to access non-API functions, since that will be more difficult to maintain. An example of this is kde.

At this step, should we clean the plotting data -> back-end interface? Currently they are more or less hacked together?

After experimenting with this over the last few days, I don't think it will make much of a difference for these features. It would make it a bit easier for me to follow what a plot is doing (i.e. separating computation of statistics from how its plotting), but I'd be essentially duplicating the sequence of computations here.

sethaxen commented 3 years ago

Work on ArviZ/Plots integration has begun at https://github.com/arviz-devs/ArviZPlots.jl.

treigerm commented 3 years ago

Hey just want to leave a comment saying that I'm really looking forward to this! In the past I've run into some issues with having the right Python libraries installed for plotting in my Julia environment. Having the plotting done all on the Julia side would simplify things a lot.

sethaxen commented 3 years ago

Thanks, @treigerm, I'm looking forward to it too! The idea for now is to still handle all analyses on the Python side and just implement the necessary Plots.jl recipes to handle the plotting itself in Julia. So you'd still need e.g. matplotlib installed, but it wouldn't be used for the plots unless you triggered the pyplot backend in Plots.

What kind of issues have you had getting the Python libraries installed? It would be nice to know of any reproducible issues blocking usage of ArviZ.

treigerm commented 3 years ago

I was just trying to reproduce the issue I had and now it has disappeared. I think potentially it had something to do with which conda environment I had activated. It's not a major issue for me right so I don't have time to investigate it further but I will open an issue once I encounter it again.

ParadaCarleton commented 3 years ago

I'm not sure how much work has been done on this already, but I think we might want to consider whether we want to use Plots.jl or a package that has more in common with ggplot2, such as VegaLite or Gadfly. A ggplot2-like package would let you almost copy/paste code directly from Bayesplot, which has a lot of functionality already. Another advantage is that a lot of people are familiar with ggplot2; people who are used to Python can already use Arviz.jl together with PyPlot, but anyone used to R is going to have a lot of trouble switching to doing their Bayesian stats in Julia.

sethaxen commented 11 months ago

The current plan is to implement 2 packages: ArviZPlots.jl and ArviZMakie.jl. These packages should probably share a repo (this repo?) so that it's easier to keep their APIs similar. It's probably not reasonable to keep the APIs identical, as style keywords should probably follow the conventions of the corresponding plotting package, but everything else about the interfaces should be as close as possible.

alecloudenback commented 3 months ago

@sethaxen I'd be willing to help out with this, starting with Makie. If there's no prior work on this I could just start working on this, generally following the pattern in the Python versions.

After talking with the Makie devs re: a PR to MCMCChains, instead of a new ArviZMakie package, would the pattern currently shown in that PR using extensions work here?

sethaxen commented 3 months ago

Hi @alecloudenback, thanks for the interest! Help would certainly be appreciated! Currently we have an open GSoC project to work on this, and we're expecting at least one application, but there are a lot of plots to implement and plenty of ways to contribute. Perhaps we can touch base after I see what applications we receive.

After talking with the Makie devs re: a PR to MCMCChains, instead of a new ArviZMakie package, would the pattern currently shown in that PR using extensions work here?

My current thinking on this is that it makes more sense to have a stand-alone package for several reasons, including:

With an ArviZMakie package, we can focus on just getting the package right. In the future, we could always move the functionality into an extension if that makes more sense.

It's worth pointing out that Python ArviZ is also going through a refactor that separates the plots out into their own package, which also involves splitting some plots into multiple plots: https://github.com/arviz-devs/arviz-plots