Refine Norm Shuffle Logic (Group key to be reshuffled)

ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop

https://github.com/ShifuML/shifu/wiki

Apache License 2.0

249 stars 109 forks source link

Refine Norm Shuffle Logic (Group key to be reshuffled) #742

Open zhangpengshan opened 3 years ago

zhangpengshan commented 3 years ago

Currently in 'shifu norm -shuffle ...', logic is to generate reducer id as mapper output key, means in reducer input, only 1 key in each reducer, although reducer shuffle is optimized for such big key scenario, better to shuffle to different keys in on reducer to void group big key cost.

See DataShuffle logic.