Uses matplotlib to create simple Sankey diagrams flowing only from left to right.
pysankey
contains a simple expected/predicted dataset called fruits.txt
which looks
like the following:
true | predicted | |
---|---|---|
0 | blueberry | orange |
1 | lime | orange |
2 | blueberry | lime |
3 | apple | orange |
... | ... | ... |
996 | lime | orange |
997 | blueberry | orange |
998 | orange | banana |
999 | apple | lime |
1000 rows × 2 columns
You can generate a sankey's diagram with this code:
import pandas as pd
from pysankey import sankey
import matplotlib.pyplot as plt
df = pd.read_csv(
'fruits.txt',
sep=' ',
names=['true', 'predicted']
)
colorDict = {
'apple':'#f71b1b',
'blueberry':'#1b7ef7',
'banana':'#f3f71b',
'lime':'#12e23f',
'orange':'#f78c1b',
'kiwi':'#9BD937'
}
labels = list(colorDict.keys())
leftLabels = [label for label in labels if label in df['true'].values]
rightLabels = [label for label in labels if label in df['predicted'].values]
# Create the sankey diagram
ax = sankey(
left=df['true'],
right=df['predicted'],
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12
)
plt.show() # to display
However, the data may not always be available in the format mentioned in the previous example (for instance, if the dataset is too large). In such cases, the weights between the true and predicted labels can be calculated in advance and used to create the Sankey diagram. In this example, we will continue working with the data that was loaded in the previous example:
# Calculate the weights from the fruits dataframe
df = df.groupby(["true", "predicted"]).size().reset_index()
weights = df[0].astype(float)
ax = sankey(
left=df['true'],
right=df['predicted'],
rightWeight=weights,
leftWeight=weights,
leftLabels=leftLabels,
rightLabels=rightLabels,
colorDict=colorDict,
aspect=20,
fontsize=12
)
plt.show() # to display
sankey(left, right, leftWeight=None, rightWeight=None, colorDict=None, leftLabels=None, rightLabels=None, aspect=4, rightColor=False, fontsize=14, ax=None, color_gradient=False, alphaDict=None)
left, right : NumPy array of object labels on the left and right of the diagram
leftWeight, rightWeight : Numpy arrays of the weights each strip
colorDict : Dictionary of colors to use for each label
leftLabels, rightLabels : order of the left and right labels in the diagram
aspect : vertical extent of the diagram in units of horizontal extent
rightColor : If true, each strip in the diagram will be be colored according to its left label
fontsize : Fontsize to be used for the labels
ax : matplotlib axes to plot on, otherwise uses current axes.
Use of figureName
, closePlot
, figSize
in sankey()
is deprecated and will be
remove in a future version. This is done so matplotlib is used more transparently as
this issue on the
original github repo suggested.
Now, sankey
does less of the customization and let the user do it to their liking by
returning a matplotlib Axes
object, which mean the user also has access to the
Figure
to customise. Then they can choose what to do with it - showing it, saving it
with much more flexibility.
plt.savefig("<figureName>.png", bbox_inches="tight", dpi=150)
The closePlot
is not needed anymore because without plt.show()
after sankey()
,
no plot is displayed. You can still do plt.close()
to be sure to not display this
plot if you display other plots afterwards.
You can modify the sankey size by changing the one from the matplotlib figure.
plt.gcf().set_size_inches(figSize)
pip3 install -e ".[test]"
pylint pysankey
python -m unittest
coverage run -m unittest
coverage html
# Open htmlcov/index.html in a navigator