PESTools / pestools

PESTools
12 stars 10 forks source link

Plotting class structure and one to one plot #8

Closed aleaf closed 9 years ago

aleaf commented 9 years ago

Here is a structure for plotting. The main goal is to have as much of the plotting code in one place, and also to minimize duplication, while from an API standpoint still having the plots be called as methods on the various PEST objects. I roughly copied some of the structure from pandas (which is obviously of a different scope, but conceptually similar in terms of how the plots are used). Some of the stuff can maybe be tossed or pared down, and we may want to add or centralize additional functionality.

A secondary goal is to accommodate both quick plotting and report-quality figures. I tried to do this with the one-to-one plot class (see the ipython notebook in the examples folder). It can be called on Res() instances, with a string or list of strings to specify the observation groups to include, or a dictionary can be supplied with pyplot keywords (one dictionary of keywords for each observation group), so that the plot can be completely customized to report quality. Other aesthetic wrappers (such as Seaborn) can also be imported at the same level as the Res() instance.

Also added

Let me know what you think, especially about the plotting class.

echristi commented 9 years ago

I just started to look at this. Here are my initial thoughts. Overall, I think this is probably the way to go but a little complex. I think we pay a price to allow all the flexibility. We'll just have to work to keep it simple to the user on the API side. I have some catching up to do to understand how everything works.

Can we name all the plotting methods plot_whatever. That way it is clear what all the available plotting methods are and easy to get to if working in an environment with tab completion. I think you only have to change in the Res() class and can keep the same in plots.py

With the Plot() class we are assuming everything we want to plot of originates from a DataFrame. Is that going to hold?

We don't always have to use fig, ax = for everything do we? Just for more advanced and customized stuff, right? I still have a hard time with figure and axes and matplotlib uses the terms.

Should the one2one have groupinfo as an optional argument? If not supplied plots one2one of all residuals.

Otherwise everything seems to work. I had to make sure to import Rei class in the notebook but that is a different issue that we'll work out soon.

aleaf commented 9 years ago

I agree that we should be careful to avoid unnecessary complexity. However, I think some structure (such as that in the plotting classes) will be helpful in the long-run to the extent that it helps minimize the amount of code to be developed / maintained, and also promotes a consistent interface for the user.

Having flexibility in the plotting is also really important on my end (both for personal and byzantine USGS report standards reasons). I think we can minimize the cost of flexibility by exploiting things like keyword arguments, where the user isn't required to input anything, but can consult the matplotlib documentation if they want to make a change. We just need to be clear where the keywords are going.

Along those lines, I think we have to work with figures and axes, because for better or worse a lot of the functionality in matplotlib depends on them, and they make it easier to keep track of all of the moving parts in a plot. For example, they are absolutely needed for figures with both left and right y-axes, and also for subplots. You can use plt.gca() to get the current axis and then assign it to a new variable, but I think generally it is best to initialize the axis at the beginning of the plotting code.

Also agree about beginning all names for plotting methods with plot.

Not sure about cases where we wouldn't want to plot from a dataframe. Dataframes do make nice containers in this case, because they can hold any type or amount of data, and you just specify which columns you want to use. Also having the columns indexed by labels can help avoid indexing errors. It seems like the biggest potential downside would be performance. When they get really large, they can be slow.

If we find that we want to plot directly from numpy arrays, we could just add some code in the base plotting class to identify the submitted container type, and adjust the attributes accordingly. Pandas does this to distinguish between dataframes and series, and some of the Seaborn plotting methods have similar checks.

We could easily make groupinfo optional. The reason I didn't on the first cut was because at least for regional groundwater models (and probably many other applications), plotting the whole observation dataset doesn't make much sense because of differences of scale (if there are flux values, you probably won't be able to see much of what's going on with the heads, and certainly won't be able to see things like vertical head differences). But maybe there are other cases where observations are of similar magnitudes.

echristi commented 9 years ago

Merged pull.

Let's fully develop all Res/Rei and ParSen plotting methods to get everything work the way we want before expanding too far.

aleaf commented 9 years ago

sounds good. Is there anything you are planning to work on so we can minimize overlap? I could work on integrating hexbin into the plotting structure. I was also thinking of working on a method to compile phi contributions for each iteration (building off of what is already in Res). As well as spatial plotting of residuals and an option to output point data to a shapefile (could be used for residuals and parameters).