ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
249 stars 109 forks source link

Refine Norm Shuffle Logic (Group key to be reshuffled) #742

Open zhangpengshan opened 3 years ago

zhangpengshan commented 3 years ago

Currently in 'shifu norm -shuffle ...', logic is to generate reducer id as mapper output key, means in reducer input, only 1 key in each reducer, although reducer shuffle is optimized for such big key scenario, better to shuffle to different keys in on reducer to void group big key cost.

See DataShuffle logic.