8080labs / ppscore

Predictive Power Score (PPS) in Python
MIT License
1.1k stars 165 forks source link

Is there a paper available? #3

Open swayson opened 4 years ago

swayson commented 4 years ago

Hi there,

This appears to be quite a valuable metric and I am curious, have you published a pre-print, or paper on the details surrounding the experiments and calculations?

Thanks.

8080labs commented 4 years ago

Hi, happy to hear that! We did not publish something yet but are currently in the process of doing so. We also wanted to check the input from the crowd to ensure that we have less blind spots.

8080labs commented 4 years ago

BTW: why is the paper interesting/important to you?

swayson commented 4 years ago

Hi, thank you for the reply and feedback. I was curious about a paper in order to see what experiments you may have run so I can get a better understanding of some if the PPS properties. The organisation I work for has very strict model governance and are quite traditional, and so if I were to use PPS, a paper could be helpful to substantiate any claims and so on.

8080labs commented 4 years ago

Thank you for providing some further background. As soon as there is a paper available, we will let you know

dkloving commented 4 years ago

Just to let you know, I am also looking forward to a paper that I can cite.

8080labs commented 4 years ago

Thank you for letting us know. How do you plan to use the PPS in your paper? As a means to explore relationships between the data during EDA or for reporting a found strength?

dkloving commented 4 years ago

Basically, I have 92 variables and there are a lot that are derived from from others in my dataset. Some are obvious, but some are not. The data set was originally used with logistic regression, but now I'm using tree ensembles and I need to do a lot of bootstrapping to get feature importance confidence intervals. Having unneeded independent variables both slows down the computations and increases the uncertainty around feature importance. I'm using PPS to identify features to be removed.

8080labs commented 4 years ago

Very interesting approach! Did the PPS help you during your task? And so I understand that you want to document your approach and thus of course you need a paper for the PPS. Is that correct?

muddyatty commented 4 years ago

BTW: why is the paper interesting/important to you?

That's a very strange question. A paper is good practice and a necessity when releasing code or equations for other researcher's to use. Especially when the equation/algorithm is meant to replace a widely used statistical standard like the correlation score.

dkloving commented 4 years ago

Very interesting approach! Did the PPS help you during your task? And so I understand that you want to document your approach and thus of course you need a paper for the PPS. Is that correct?

Yes, PPS was definitely helpful. I do need to cite because it was academic work. This was just for a course so I cited your blog article, but it's really not ideal and I couldn't try to publish with that.

dkloving commented 4 years ago

Even just a pre-publishing server such as arXiv would be sufficient. If you need help getting the paper out, I might be able to help.

8080labs commented 4 years ago

@muddyatty I agree with your statement in a research context. I asked the question because I wanted to better understand the specific situation because many users are using the ppscore in an industrial context where an academic paper is nice to have but other solutions/documentations might fit their needs even better

8080labs commented 4 years ago

@dkloving understood and I would really like to get back to your offer regarding some help about the paper. It would be great to hear your thoughts on this. Can you please reach out to me via florian AT 8080labs.com ? That would be much appreciated