ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
251 stars 109 forks source link

Add data rebalance function and set score name according the model spec name when eval #663

Closed huzza closed 5 years ago

huzza commented 5 years ago
  1. set model score name according the model spec file name
  2. add function to rebalance data when shuffling the dataset https://github.com/ShifuML/shifu/issues/662
zhangpengshan commented 5 years ago

Thanks Zhanghao for this PR :)

Could you summarize changes in this PR? Saw 18 files changed and not clear how i can review it.

huzza commented 5 years ago

There are mainly two changes - 1) set model score name according the model spec file name (before we always use model0, model1, ...); 2) for example if the positive rate is too now, user can adjust positive rate when shuffling norm/clean dataset. We achieve this by duplicate positive records or adjust the weight of positive records.