JackMcKew / pandas_alive

Create stunning, animated visualisations with Pandas & Matplotlib as easy as calling `df.plot_animated()`
MIT License
582 stars 100 forks source link

plot_animated throws lots of unnecessary warnings when data has many columns #37

Open hraftery opened 3 years ago

hraftery commented 3 years ago

Describe the bug There are two issues here, but it turns out their cause and fix overlap, so I've included them both.

plot_animated outputs two warnings with text: "UserWarning: FixedFormatter should only be used together with FixedLocator". Turns out this is a known gotcha when using set_yticklabels or set_xticklabels. Although the warning is innocuous in this case (it's purpose is described here), it's alarming for the user of pandas_alive, and is easy to suppress.

The "fix" is to call ax.set_xticks(ax.get_xticks()) before the call to set_xticklabels, and similarly for the y-axis.

More info here and here.

If the data to be plotted has many columns (more than about 60), then plot_animated outputs dozens of warnings with text like:

/usr/local/lib/python3.9/site-packages/matplotlib/backends/backend_agg.py:201: RuntimeWarning: Glyph 157 missing from current font.
font.set_text(s, 0, flags=flags)

where "157" varies from 128 to beyond 157.

This was really hard to pinpoint, but turns out to be because a fake list of column headings is generated when the plot is first created, by iterating through ASCII characters by the number of columns in the data. If there's too many columns, it iterates right off the normal ASCII range, and the standard matplotlib fonts do not have a glyph for ASCII values beyond 127.

To Reproduce This method is derived from the panda_alive author's article here. In other words, this is a pretty typical to a "my-first-pandas_alive-animation".

import pandas as pd
import matplotlib.pyplot as plt
import pandas_alive
from IPython.display import HTML
import urllib.request, json
from datetime import datetime

NSW_COVID_19_CASES_BY_LOCATION_URL = "https://data.nsw.gov.au/data/api/3/action/package_show?id=aefcde60-3b0c-4bc0-9af1-6fe652944ec2"
with urllib.request.urlopen(NSW_COVID_19_CASES_BY_LOCATION_URL) as url:
    data = json.loads(url.read().decode())

data_url = data["result"]["resources"][0]["url"]
df = pd.read_csv(data_url)

df['lga_name19'].fillna("Unknown", inplace=True)
df['notification_date'] = pd.to_datetime(df['notification_date'])
df_grouped = df.groupby(["notification_date", "lga_name19"]).size()

df_cases = pd.DataFrame(df_grouped).unstack()
df_cases.columns = df_cases.columns.droplevel().astype(str)
df_cases = df_cases.fillna(0)
df_cases.index = pd.to_datetime(df_cases.index)

animated_html = df_cases.plot_animated(n_visible=15)

Expected behavior Following an introductory tutorial and using the library as intended would not show dozens of warnings. See so many warnings leaves the newbie feeling like they've done something wrong.

Additional context Here is a patch for pandas_alive/charts.py that fixes the two issues. If this fix is suitable, I can create a PR, or two if you'd like to separate the issues.

214c214
<         fake_cols = [chr(i + 70) for i in range(self.df.shape[1])]
---
>         fake_cols = [chr(i + 70) for i in range(self.n_visible)]
218c218
<             ax.barh(fake_cols, [1] * self.df.shape[1])
---
>             ax.barh(fake_cols, np.ones(len(fake_cols)))
221c221,225
<             ax.set_yticklabels(self.df.columns)
---
>             # Before the labels are set, convince matplotlib not to throw user warning about FixedLocator and FixedFormatter
>             # Added by HR211009, inspired by https://github.com/matplotlib/matplotlib/issues/18848#issuecomment-817098738
>             ax.set_xticks(ax.get_xticks())
>             ax.set_yticks(ax.get_yticks())
>             ax.set_yticklabels(self.df.columns[:len(fake_cols)])
224c228
<             ax.bar(fake_cols, [1] * self.df.shape[1])
---
>             ax.bar(fake_cols, np.ones(len(fake_cols)))
227c231,235
<             ax.set_xticklabels(self.df.columns, ha="right")
---
>             # Before the labels are set, convince matplotlib not to throw user warning about FixedLocator and FixedFormatter
>             # Added by HR211009, inspired by https://github.com/matplotlib/matplotlib/issues/18848#issuecomment-817098738
>             ax.set_xticks(ax.get_xticks())
>             ax.set_yticks(ax.get_yticks())
>             ax.set_xticklabels(self.df.columns[:len(fake_cols)], ha="right")