Closed aj-prime closed 5 months ago
Hello @aj-prime, can you please let us know the environment variables you are using and can you confirm that "Optimal Memory Allocator Settings Specific to ONNXRT" section in the user guide was followed?
For this case our recommended settings are: export GOMP_CPU_AFFINITY=0-63 && export OMP_NUM_THREADS=64 && export OMP_WAIT_POLICY=ACTIVE && export OMP_PROC_BIND=FALSE && export OMP_DYNAMIC=FALSE && export ZENDNN_MATMUL_ALGO=FP32:4 && export LD_PRELOAD=$ZENDNN_PARENT_FOLDER/openmp-10.0.1.src/runtime/src/libomp.so
With thp setting as "always"
Thanks @ajeet1203singh. Using Optimal Memory Allocator Setting resolved the issue.
Hello @ajeet1203singh and @aj-prime , can you please share some number on the expected throughput improvement of ZenDNN 4.2?
I'm also trying to run the transformer benchmark, and also got the similar result (ZenDNN is slower than CPU Execution Provider).
Describe the issue
I followed the installation instructions described in the section 4 of the README.
Processor Name: AMD EPYC 7V13 64-Core Processor (Azure Cloud)
Performance (QPS): ZenDNN: 34 CPU: 77
To reproduce
CPU: python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel -p fp32 -i 3 -t 10 -b 24 -s 16 -n 96 -v --provider cpu
ZenDNN: python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel -p fp32 -i 3 -t 10 -b 24 -s 16 -n 96 -v --provider zendnn
I tried 64 threads also, but it results in worse performance.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04.4 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-zendnn:1.17.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Unknown