Currently in 'shifu norm -shuffle ...', logic is to generate reducer id as mapper output key, means in reducer input, only 1 key in each reducer, although reducer shuffle is optimized for such big key scenario, better to shuffle to different keys in on reducer to void group big key cost.
Currently in 'shifu norm -shuffle ...', logic is to generate reducer id as mapper output key, means in reducer input, only 1 key in each reducer, although reducer shuffle is optimized for such big key scenario, better to shuffle to different keys in on reducer to void group big key cost.
See DataShuffle logic.