8080labs / ppscore

Predictive Power Score (PPS) in Python
MIT License
1.12k stars 168 forks source link

Update SciKit-Learn Library Requirements #57

Closed JDM288 closed 3 years ago

JDM288 commented 3 years ago

Side note: The issue was with the pandas series, but if you use .astype("int64") or .astype("float"), or anything like that it fixes the issue too. Sklearn just needed us to declare the dtype of the pandas series.

fwetdb commented 3 years ago

Interesting, thank you. Why did you decide for the general to_numpy? And how does this more general solution work?

JDM288 commented 3 years ago

@fwetdb It looks like Sklearn handles numpy arrays differently than pandas dataframes. I digged as far as I could into the ppscore library, and found those two functions, cross_val_score and mean_absolute_error, that were causing the error. When I changed the dataframes to numpy, the problem was fixed, so I assume something about the way sklearn handles pd dataframes changed. Without looking into the new code for sklearn i cant give you an answer to that. I only decided on that first because I could not seem to get pandas dataframes to work for a bit, but once I declared their datatype it worked fine.

fwetdb commented 3 years ago

Great, thank you a lot @JDM288