changqi1 / DeepRec

DeepRec is a recommendation engine based on TensorFlow.
Apache License 2.0
3 stars 9 forks source link

[Graph][Optimization] Concat+cast fusion to improve performance #24

Open shanzhou2186 opened 2 years ago

shanzhou2186 commented 2 years ago

concat+cast fusion optimization Goal Optimize performance through concat+cast fusion

Problem Description In some of recommendation model, for example, DLRM, after enabling bf16 in DeepRec, there is potential performance gain through concat and cast fusion.

Here is the step to reproduce the performance issue.

Requirement Details

Test

Code Style and commit

Maintain

Definition of Done

aalbersk commented 2 years ago

@app-on-mic

  1. What data types should be supported by the fused Concat+Cast operation? Only fp32 and bf16 that are being used by DLRM or every data type supported in TF?
  2. Are there any additional requirements regarding testing environment for this feature?
shanzhou2186 commented 2 years ago

only FP32 is required. Test environment is the same as model zoo: CPX from alicloud and SPR, 4 physical cores, 8 logical cores.