Add data rebalance function and set score name according the model spec name when eval

ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop

https://github.com/ShifuML/shifu/wiki

Apache License 2.0

251 stars 109 forks source link

Add data rebalance function and set score name according the model spec name when eval #663

Closed huzza closed 5 years ago

huzza commented 5 years ago

set model score name according the model spec file name
add function to rebalance data when shuffling the dataset https://github.com/ShifuML/shifu/issues/662

zhangpengshan commented 5 years ago

Thanks Zhanghao for this PR :)

Could you summarize changes in this PR? Saw 18 files changed and not clear how i can review it.

huzza commented 5 years ago

There are mainly two changes - 1) set model score name according the model spec file name (before we always use model0, model1, ...); 2) for example if the positive rate is too now, user can adjust positive rate when shuffling norm/clean dataset. We achieve this by duplicate positive records or adjust the weight of positive records.