DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Allow yellowbrick.features.radviz.radviz to also accept inputs with non-continously ascending indices #1180

Closed JohannesWiesner closed 2 years ago

JohannesWiesner commented 3 years ago

I am using yellowbrick.features.radviz.radviz to plot variables from a dataframe with non-continuously ascending indices (as a result of filtering rows). However, it seems that yellowbrick.features.radviz.radviz requires the input to only have an ascending list of integers. As a consequence, I always have to use df.reset_index(inplace=True,drop=True) before plotting. I have also a case where I use df.groupby() to plot multiple radviz plots. Here again, each indices of the resulting individual dataframes have to be reset in order for radviz to work.

Is there a specific reason why this is needed? Couldn't the function calculate this necessary index behind-the-scenes, allowing me as a user to just pass X and y?

bbengfort commented 3 years ago

@JohannesWiesner thank you for the note and thanks for using Yellowbrick! In order to understand what's happening better would you please provide the exception that you're getting as well as some example code and data?

Yellowbrick doesn't know anything about pandas (it is not one of our dependencies), so Yellowbrick treats X and y like they are numpy arrays in the same way that scikit-learn does. Because of that, Yellowbrick won't reindex the array; however the error might be resolved by some other action that Yellowbrick could take.

lwgray commented 2 years ago

This issue has gone stale… @JohannesWiesner please reopen if needed