concat+cast fusion optimizationGoal
Optimize performance through concat+cast fusion
Problem Description
In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.
Here is the step to reproduce the performance issue.
Collect timeline information with DLRM from modelzoo, "numactl -C 8-15 -l python train.py --steps 100 --timeline 49 --no_eval --interaction_op dot --bf16". You will find the timeline shows below.
Requirement Details
Fusion 2 operators concat and cast into 1 operator. Both of the forward and backward operations need to be covered. And make sure it could be applied in the real models DLRM at least.
What data types should be supported by the fused Concat+Cast operation? Only fp32 and bf16 that are being used by DLRM or every data type supported in TF?
Are there any additional requirements regarding testing environment for this feature?
concat+cast fusion optimization Goal Optimize performance through concat+cast fusion
Problem Description In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.
Here is the step to reproduce the performance issue.
Requirement Details
Test
Code Style and commit
Maintain
Definition of Done