Baukebrenninkmeijer / table-evaluator

Evaluate real and synthetic datasets against each other
https://baukebrenninkmeijer.github.io/table-evaluator/
MIT License
80 stars 27 forks source link

plot_correlation_difference() is incompatible with newest pandas version #31

Closed SvenGroen closed 1 year ago

SvenGroen commented 1 year ago

Hi, it seems that plot_correlation_difference() is incompatible with the newest pandas version (1.5.2).

You can find a Notebook for recreation here: https://colab.research.google.com/drive/1zE7lSGwIVVD3o8eJEAm3_IWjSSkK8wsM?usp=sharing

I think I also know where the problem is:

Inside your plot_correlation_difference(...) inside viz.py you calculate the correlation matrix with dython.nominal.associations (Line 72)

for dython==0.5.1 inside associations(...) _comp_assoc(...) is directly called and actually calculated the correlation matrix correctly (at least in my local example). The problem for me starts when sns tries to calculate the heatmap. the pandas dataframe that contains the correlation matrix (variable corr) seems to be of type object and not float which seems to cause an error later.

an easy "fix" would be to change your requirements.txt file for pandas==1.3.5. or maybe have update to the newest dython version (have not checked if this works)

a solution I have found is, that you can also replace : fake_corr= associations(...)['corr'] with dython.nominal.compute_associations:

Here is a working example:

(...)
 # compute the associations
real_corr = compute_associations(real, nominal_columns=cat_cols, theil_u=True) 
fake_corr = compute_associations(fake, nominal_columns=cat_cols, theil_u=True)

# convert to float manually to avoid issues with the heatmap
real_corr = real_corr.astype(float)
fake_corr = fake_corr.astype(float)

# add real corr to ax[0] and fake corr to ax[1]
sns.heatmap(real_corr, ax=ax[0], cmap=cmap, vmax=.3, square=True, annot=annot, center=0,
                linewidths=.5, cbar_kws={"shrink": .5}, fmt='.2f')
sns.heatmap(fake_corr, ax=ax[1], cmap=cmap, vmax=.3, square=True, annot=annot, center=0,
                linewidths=.5, cbar_kws={"shrink": .5}, fmt='.2f')

(...)

this Problem might also occur in other cases where you have used associations(...) but I only encountered it in this scenario.

Baukebrenninkmeijer commented 1 year ago

Hey @SvenGroen, thanks for the elaborate description and already researching the problems. I decided that upgrading to use the latest version of dython and pandas seemed the most sensible, which is what I did.

The changes are in the PR, and I'll release a new version sometimes soon!