In ROCm 2.6, we're seeing a significant startup-related slowdown in TensorFlow. For our TF CI, this adds over an hour to each of our build+test cycles.
Below is what we see when running tf_cnn_benchmarks using the 'trivial' model:
our benchmark's performance metrics (images/sec) are nearly equivalent
the wall clock times show a startup performance regression in ROCm 2.6
CPU profiler data shows extra entries for comgr yaml calls in ROCm 2.6
Hi,
In ROCm 2.6, we're seeing a significant startup-related slowdown in TensorFlow. For our TF CI, this adds over an hour to each of our build+test cycles.
Below is what we see when running tf_cnn_benchmarks using the 'trivial' model:
ROCm 2.5
wall clock (sec): 19.36 total images/sec: 7213.54
ROCm 2.6
wall clock (sec): 56.83 total images/sec: 7238.64
Reverting back to the comgr package from ROCm 2.5 did not have an effect. This might indicate that the problem is related to a user of comgr.
Please help us troubleshoot whether this is a comgr issue, a HIP issue, or something else. We don't have a strong understanding of the users of comgr.
Many thanks,
Jeff