DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Enable plt.close() to clear memory #1231

Closed dilettante8 closed 2 years ago

dilettante8 commented 2 years ago

I've been using Vizualiser to extract the variables elbowvalue and elbowscore for multiple batches of training data. While looping through each batch, the corresponding figure is automatically rendered and plotted which consumes a lot of memory. I suggest enabling the option for plt.close() or even to skip plt.show() to improve performance.

bbengfort commented 2 years ago

Hi, @dilettante8 I'm sorry that no one has responded to your question in so long, we tend to get buried with GitHub alerts. Thank you so much for using Yellowbrick and for your question!

I actually use the KElbow visualizer similarly -- extracting scores without needing the actual visualization. If you just need the scores you only need to call fit() -- you don't need to call show(); additionally if you want to render the complete visualization on the figure without a call to plt.show() you can call finalize(). We use this method to draw Yellowbrick visualizations on multiple axes figures, waiting to show the figure at the end.

Note that if you're using the quick method, you should pass in show=False, e.g. kelbow_visualizer(estimator, X, show=False).

That said, Yellowbrick is a visualization tool, so it will draw to an axes and create a figure, even when simply calling fit(). If you're running your code in a notebook, the notebook may automatically render the axes from the cell even though plt.show has not been called. You have a couple of options here:

  1. Calling plt.close() yourself is perfectly valid and will work with Yellowbrick; any matplotlib changes to the current Axes and figure (which you can also fetch with viz.ax and viz.fig) will be reflected on the yellowbrick visualization.
  2. You can set the matplotlib backend to agg so that nothing will be rendered (not sure if this is respected in a notebook, but it's what we do for testing)
  3. You can pass in a single ax and figure object for each batch and reset it at the end of each batch

Your use case is certainly one we've considered and we appreciate you letting us know that it's an important one to our users! We hope you'll find that Yellowbrick has been thoughtfully designed to handle a variety of scenarios including the one you described!