8080labs / ppscore

Predictive Power Score (PPS) in Python
MIT License
1.12k stars 168 forks source link

Custom CV #10

Open AntonBiryukovUofC opened 4 years ago

AntonBiryukovUofC commented 4 years ago

This PR adds ability to use a custom sklearn-compatible CV generator, as long as it compatible with cross_val_score.

I have also provided a test for test_score_cv that iterates over different cross-validation strategies implemented in sklearn.model_selection. (branch has regression in the name, but I think the approach should work for both regression and classification problems) I still need to carve time to devise a test for " feature leakage".

AntonBiryukovUofC commented 4 years ago

@8080labs thoughts ?

8080labs commented 4 years ago

Thank you for your PR! I had a first glance on the changes and made some comments. But I definitely need some more time to have a deeper look at it and try it out myself

AntonBiryukovUofC commented 4 years ago

BS commit there is related to a bug in Github - for some reason commit #5 did not show up in the PR.

Otherwise @8080labs let me know what you think re:code changes.

AntonBiryukovUofC commented 4 years ago

Should we merge it then, or add a few specific CV tests (like timeseriees one)?

8080labs commented 4 years ago

Before merge I will have a look over everything again to make sure that it is consistent and we did not miss anything. Afterwards, I will get back to you

Thank you so much, Florian

AntonBiryukovUofC commented 4 years ago

Yeah, fair!

tkrabel commented 4 years ago

@AntonBiryukovUofC FYI: we are currently reviewing everything. Just wanted to communicate that there is progress :)

AntonBiryukovUofC commented 4 years ago

are you guys alive out there ? :)

FlorianWetschoreck commented 4 years ago

Hi Anton,

sorry for letting you wait so long. We hope that you (and potentially others) can still use your branch internally if you need it. Your PR will need some extensive time for the final review in order to make sure that we did not miss something and thus we did not find the time yet because we are currently busy with many other tasks and deadlines.

All the best, Florian