ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
251 stars 109 forks source link

Add Filter Stats in Segment Expansion #694

Open zhangpengshan opened 4 years ago

zhangpengshan commented 4 years ago

Segment expansion filter generation is supported:

https://github.com/shifuml/shifu/wiki/Segment-Expansion-for-New-Feature-Generation

While if all records are filtered in one of segment condition, it won't be showed and no such column configurations in CC.json.

It is better to have stats on such filter condition in console outputs or MR job counters.

zhangpengshan commented 4 years ago

Take one example condition say a > 200, but real data in file is 50.0, 30.3, there could be an exeception like ClassCastException from long to double but it is very difficult to identify if not check job logs. We'd better to have such exception and filter stats in console for better debugging.