Closed SuryaThiru closed 4 years ago
Hi Surya, thank you for taking the time to create the PR and thinking about how to improve ppscore. Here are my thoughts about the enhanced proposal:
Therefore, I think that the value add is not big enough.
Do you have any other thoughts?
Thank you, Florian
Yes, I guess it is not a trivial case.
It will be useful if we wish to identify strong predictors for one or more targets. While typical correlation matrices might not reveal some associations, ppscore does a good job in identifying these. So, for applications like feature selection, this might be useful where we avoid computing inter-predictive power and writing loops similar to ones already in pps.matrix
. I think more people will find themselves using ppscore for such applications.
I agree. I think, in this case, we should rather add a pps.predictors(df, target)
method. And if the user wants to inspect the features for multiple targets, she can just call this method for all target columns. This is also what has been requested in #13
Fair enough. I can move the implementation to a pps.predictors(df, target)
method.
I am thinking of making the method return a dataframe of scores with the feature names as the index, and a "score" column. It can have the sort option like you initially planned (predictors(df, y, task=None, sorted=True)
). What do you think?
That sounds good. I think it should behave similar to the matrix: either return a df with the scores or return a list of the scores dict. Sorted should be a boolean that is True per default. Also, it should accept **kwargs to pass through to the single score method.
What do you think about it?
Yep. That sounds good.
Great, looking forward to your PR :)
@SuryaThiru I think our discussion moved to PR #17. Can we close this PR?
With ppscore taking a considerable amount of computation time, I thought this would be a handy feature for people working on very wide datasets.
added some simple tests to
test_matrix
intest_calculation.py