Pierre-Sassoulas / pySankey

This is the maintened version of PySankey (pySankeyBeta on Pypi)
GNU General Public License v3.0
37 stars 8 forks source link

Fixed index mismatch issue when passing a dataframe which has been sorted in any way #12

Closed minoguep closed 2 years ago

minoguep commented 2 years ago

Hi there 👋

Noticed an issue when using this module the other day where if you pass a dataframe that has been sorted based on the weights, the output is incorrect (see example attached below). I did a bit of digging and I noticed that this is because you reindex left and right if they are passed as a series but not the leftWeight and rightWeight, so when you then create the dataFrame variable, there is in index mismatch and the values get jumbled up basically.

Example

Sample dataset

left,right,weight
apple,apple, 2
apple,orange, 3
apple,banana,3
orange,apple,5
orange,orange,7
orange,banana,2
banana,apple,4
banana,orange,1
banana,banana,0

Create Sankey's


import pandas as pd 
from pysankey import sankey

data = pd.read_csv("sample_data.csv")

# Example 1: No reordering
sankey(
    left=data["left"], right=data["right"], 
    leftWeight=data["weight"], rightWeight=data["weight"], 
    aspect=20, fontsize=20
)

# Example 2: Some sorting applied (notice the difference in banana -> orange, and orange -> orange)
data_sorted = data.sort_values(by="weight")
sankey(
    left=data_sorted["left"], right=data_sorted["right"], 
    leftWeight=data_sorted["weight"], rightWeight=data_sorted["weight"], 
    aspect=20, fontsize=20
)
minoguep commented 2 years ago

Added a test there. Always kinda messy dealing with dataframe comparisons so hopefully the way I went about it is OK for you.

Let me know if there are any issues or if anything doesn't make sense.

minoguep commented 2 years ago

I don't urgently need a release so no rush at all, whenever suits you!

Pierre-Sassoulas commented 2 years ago

Ok, it could be a while as there is no release pipeline on this repository so it's manual or I need to put it into place. Meanwhile you can install with pip install -U git+https://github.com/Pierre-Sassoulas/pySankey.git@4f64f8aed18ba137d8c53d856412f90432005e6c

minoguep commented 2 years ago

Cool, thanks!