8080labs / ppscore

Predictive Power Score (PPS) in Python
MIT License
1.12k stars 168 forks source link

ppscore changes to 0 for multiple variables after upgrade #46

Closed josecmontes closed 3 years ago

josecmontes commented 4 years ago

Hi,

The ppscore is very good, I have tried it out previously and yielded good results.

I was using the 0.02 version and very recently (about 2 weeks ago) I tried it out on a dataset which gave a score to multiple variables. This week I had to upgrade because I recieve an AttributeError: module 'ppscore' has no attribute 'predictors'

After the upgrade I only had 2 variables which yielded ppscores, with all the other variables going to value 0.

Nothing else from the code was changed so I asume this was due to the upgrade, I suspect the 0 value is not correct because I have visually confirmed some distributions and applied statistical test to some variables (KS and chi-square) and they seemed to be somewhat predictive (plus they previously had scores).

Any clue on what might have happended? Many thanks

FlorianWetschoreck commented 4 years ago

Hey,

this might have to do with the implicit assumptions about the data types and thus the chosen evaluation metric. e.g. numeric data with only a few categories have been treated as categoric but now they are treated as numeric unless you change their data type. Can you maybe share some more information about the columns that changed their scores? e.g. data types, unique values, distributions, ppscore before?

josecmontes commented 4 years ago

Hi Florian Wetschoreck,

Unfortunatly, I am not able to share much of the data here. I have tried with a different dataset now and all values of the ppscore are 0 (which I doubt it)

As mentioned I had to upgrade te ppscore because it was given me an error [AttributeError: module 'ppscore' has no attribute 'predictors' ]. When doing so I lost all data on the previous scores but I have multiple ones with more than 0 and now just 2. I am sure this has to do with something of the dtypes (it is worth mentioning that data and code is the same, the only change was the upgrade of version so it is kind of strange in that sense).

I can share the results of some coulmns and the data types below. As mentioned I just tried a diferent dataset and all ppscores are 0.

This library was really usefull to me so if there is anything I can help let me know.

image

sabi_sum_ingresos_ult1 float64 sabi_sum_result_ult1 float64 sabi_sum_empleados_ult1 int64 sabi_bas_condicion object sabi_bas_falcon_crs_actual int64 sabi_bas_falcon_prob_incump float64 sabi_bas_falcon_lim_credito int64 sabi_bas_more_punt_corriente object sabi_bas_more_prob_incump float64 sabi_bas_more_credit_limit int64 sabi_bas_wvb_global_score object sabi_bas_puntuacion_crif int64 sabi_bas_crif_min int64 sabi_bas_vadis_p2bb float64 target int64

FlorianWetschoreck commented 4 years ago

Hi Jose,

sorry to hear that the new results are not so useful to you anymore.

Currently, you have the following options:

Please let me know if this worked for you, Florian

josecmontes commented 4 years ago

I simply went back to 0.0.4 and it worked without changing anything

FlorianWetschoreck commented 3 years ago

Happy to hear that :)