DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.26k stars 555 forks source link

Importing yellowbrick changes global matplotlib styles #734

Open cac04 opened 5 years ago

cac04 commented 5 years ago

UPDATE: After some investigation, we've discovered that Seaborn has rolled back their automatic style changes on import, and we should probably do the same. To fix this issue the following things need to be done:

Original Issue

This may be intentional behavior. If it is, sorry for the noise! It surprised me though, so maybe it should be in the documentation somewhere.

When you import yellowbrick, it runs yellowbrick/__init__.py which contains this line:

set_aesthetic() # NOTE: modifies mpl.rcParams

This does indeed change matplotlib.rcParams, which means that every other plot generated via matplotlib is altered.

I have a workaround, which is to create a context manager for Yellowbrick plots:

import warnings
import matplotlib as mpl
import yellowbrick

class YellowbrickStyle:
    """
    Context manager for global mpl style state to accommodate Yellowbrick
    """
    def __enter__(self):
        yellowbrick.set_aesthetic()

    def __exit__(self, *args):
        with warnings.catch_warnings():
            warnings.simplefilter('ignore', mpl.MatplotlibDeprecationWarning)
            yellowbrick.reset_orig()

# Importing yellowbrick modifies global mpl style state so reset it immediately
with YellowbrickStyle():
    pass

And then I wrap every use of Yellowbrick in a with YellowbrickStyle(): block, e.g.

with YellowbrickStyle():
        visualizer = yellowbrick.regressor.ResidualsPlot(model)
        visualizer.fit(X=training_X, y=training_y)
        visualizer.score(X=testing_X, y=testing_y)
        visualizer.poof()

This works but it's a bit of a hassle. Did you mean to change the matplotlib rcParams for every other plot in a project?

bbengfort commented 5 years ago

@cac04 thank you for your question, and it certainly is not noise -- we're always happy to go over the choices we've made in the library and see if there are ways we can do things better. In this case, yes it was intentional that we modify the default styles of matplotlib not just for Yellowbrick but for all other visualizations. I will say that the decision was not made lightly and it was after a lot of discussion, which I'll go into below.

But first, we do have an escape valve if this is influencing your code:

from yellowbrick.style import reset_defaults, reset_orig 

reset_defaults() # reset the default mpl rcParams 
reset_orig()     # reset to original rcParams, respects custom rc 

I've noted that you used reset_orig in your context manager, but I wanted to ensure we highlighted it in case anyone else found this issue and had the same question.

Ok, so onto the discussion.

When we were originally developing the library we wanted to ensure that the diagnostics did not cause surprise and that they could be used in an "at-a-glace" fashion. One example that we dealt with early on was the matplotlib color cycle - we wanted to ensure that across figures, the cycle was respected so classes would be assigned the same color in multiple diagnostics (or so that test vs train colors would be respected). Other examples included aspect ratios when dealing with 45 degree lines and grid lines interfering with readability. We concluded that managing the style was essential to effective visual diagnostics.

At the time, we took a queue from seaborn, which also modified the rcParams at import (though this has changed since seaborn v0.8 and the style must be explicitly called). We decided that effective visual diagnostics was more important than trying to create a complicated style library, and we borrowed seaborn's style management pretty directly. At the time we also had less matplotlib expertise and more ML expertise and really wanted to focus on the business of creating visualizers.

We realized then that this decision would potentially affect expert users like you, and so we're open to having the discussion again. I certainy appreciate your proposal of a context manager, seaborn also has one when using with sns.axes_style(); however, because this adds a layer of overhead, I would prefer the context manager to be used only in the expert case.

You're absolutely right, we should probably document this more clearly, potentially adding a page such as Controlling figure aesthetics to our documentation.

This reply is getting a bit long, and I'm realizing that I may have to do additional research in order to figure out how we can balance our requirements (and potentially to specify them). I don't want to conclude this without a call to action, however. Would you be up for a discussion about how you use Yellowbrick so that we can be more effective at that requirements specification?

cac04 commented 5 years ago

Thank you for that detailed reply! Yes, I'd be very happy to discuss this further. I'm afraid I am not an experienced Yellowbrick user: I only started using it a few weeks ago. I had a notebook that already contained many plots: some created by using matplotlib directly, some via Seaborn. Then I added a yellowbrick.regressor.ResidualsPlot and suddenly all my existing plots changed.

I guessed that Yellowbrick was doing something to the global matplotlib styles but couldn't see anything in the documentation. It didn't take too long to work it out but it certainly would have been more convenient if there were a page like Controlling figure aesthetics somewhere prominent in the Yellowbrick docs.

Would it be possible to use something like my context manager inside the poof method?

bbengfort commented 5 years ago

@cac04 sorry for the delay in response - we've been slammed with issues and PRs over the past couple of days; we're just a small group of part-time volunteers so we have to juggle things as best we can.

I did try using the context manager inside of the poof method - but it didn't work out of the gate. However, I did note some promising things in that direction. I think a documentation page would also be useful.

What I'll do at this point is to update your original issue to more clearly reflect the change I'm hoping to make, and we'll try to get that in by the v1.0 release -- which will be in a few months.

cac04 commented 5 years ago

Thank you very much for looking at this. I don't know if it's high priority - especially if I'm the only one to have filed an issue about it! - but I thought it was worth asking about.

gavsideas commented 5 years ago

I may be getting the wrong end of the wrong stick here but would it be possible to create a sandbox for rcParams that could then hold a context manager until changes need to be imported back into the live yellowbrick environment?

lwgray commented 5 years ago

hi @gavsideas Thanks for adding to the discussion. We are on a very short hiatus and will provide feedback once we return. cheers

joelostblom commented 5 years ago

Thanks for looking into this! It would be great if it could be handled the same way as in seaborn, where users have to call seaborn.set() to explicitly modify the global matplotlib rc params. In my limited testing, neither reset_defaults() or reset_origin() successfully reset the styles to the same as when launching a notebook the first time (tested in JupyterLab only, not classic).