[GEMM] dummy_memset() is ridiculously slow

atamazov commented 3 years ago

Questions:

Is it possible to avoid hipMemsetAsync() and use SetTensor() or something else for that?
Is it possible to improve performance of hipMemsetAsync()?

The performance loss is so huge that I am assigning the bug label.

The issue is originated from https://github.com/ROCmSoftwarePlatform/MIOpen/issues/717#issuecomment-769829395:

I've tested your smallest config on my system with recent MIOpen, all caches enabled, and ROCm 4.0 and found that even in the best case the overhead is ~1500ms which is ridiculous. Further investigation shows that almost all this time is spent in the hipMemsetAsync() HIP runtime call, invoked from MIOpen's dummy_memset(). This happens during execution of GEMM algorithm. With MIOPEN_DEBUG_CONV_GEMM=0, the actual library's overhead is ~8ms.

Most likely this should be addressed to the GEMM algorithm developers and/or to the HIP runtime team.

Logs (binary cache enabled, MIOPEN_FIND_MODE=normal):

Logs at level 6 with timestamps

overhead-01.txt - 1st run, compilations

overhead-02.txt - 2nd run, kernels read from binary cache

overhead-03.txt - 3rd run

Logs at level 6, no timestamps, good for diffing

overhead-notime-01.txt

overhead-notime-02.txt

overhead-notime-03.txt

overhead-notime-02-nogemm.txt - with GEMM disabled

Example MIOpenDriver command:
MIOPEN_FIND_MODE=normal \
MIOPEN_ENABLE_LOGGING_ELAPSED_TIME=1 \
MIOPEN_LOG_LEVEL=6 \
./bin/MIOpenDriver conv -n 1 -c 3 -H 2 -W 2 -k 8 -x 3 -y 3 -p 1 -q 1 -u 2 -v 2 -V 0 -w 2 -t 1 -i 2 -F 1 \
2>&1 | tee ~/mio/overhead-01.txt

atamazov commented 3 years ago

A large loss of time is expected, therefore value_high.

atamazov commented 3 years ago

~It seems like~ #554 resolves this. ~@asroy Am I correct?~

ROCm / MIOpen

[GEMM] dummy_memset() is ridiculously slow #723

Questions:

The issue is originated from https://github.com/ROCmSoftwarePlatform/MIOpen/issues/717#issuecomment-769829395:

Logs (binary cache enabled, MIOPEN_FIND_MODE=normal):

Example MIOpenDriver command: