ROCm / clr

MIT License
104 stars 50 forks source link

[Issue]: No performance improvement using hipGraph #77

Closed harrisonsz closed 4 weeks ago

harrisonsz commented 6 months ago

Problem Description

GPU: RX6400 (I cannot find this model in all given GPU options)

I was trying to use hipGraph instead of hipStream to accelerate some computation. I find that the difference between performance using stream and graph is minor. I've tested the same program in a cuda manner using Nvidia's GPU and there was significant improvement, so I know for certain that my program was correctly written. My program run on Rocm 5.6.0, then I upgraded it to 5.7.0 and there was no difference in terms of performance. I wonder in which version of Rocm there is some optimization on hipGraph. Also, since I'm using a relatively outdated amd GPU - RX6400, I wonder if hipGraph can only have siginificant influence on some certain models.

Operating System

Ubuntu 22.04.3 LTS(Jammy Jellyfish)

CPU

11th Gen Intel(R) Core(TM) i5-11400

GPU

AMD Radeon VII

ROCm Version

ROCm 5.7.0

ROCm Component

clr, HIP

Steps to Reproduce

I wrote two simple programs to test performance. One uses stream, and another uses graph. I made them txt because github doesn't allow me to upload cpp files. Simply change them to cpp, compile and run the two programs to see the output. hip_only_stream.txt hip_using_graph.txt

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

harrisonsz commented 6 months ago

I've tested on RX7900XTX, there is still no improvement.

ppanchad-amd commented 2 months ago

Hi @harrisonsz, internal ticket has been created to investigate this issue. Thanks!

schung-amd commented 2 months ago

Hi @harrisonsz, thanks for pointing this out! Unfortunately, this is a known issue. I actually see a performance loss with the graph version of your code on ROCm 6.2 with a 7900XTX! Scaling the problem up to N = 1024 1024 100, the graph version outperforms the stream version by only 2%.

While I don't think we have any public-facing documentation about this, hipGraph currently does not provide as much of an advantage as CUDA graphs. We're working on improving this, although I am not aware of any definite timelines. I'll reach out to our internal teams to see if they have any additional information.

schung-amd commented 1 month ago

A quick update from the internal team: we have been making a lot of good progress with hipGraph performance, but we've been focused on the MI300 so many of the improvements at the moment are only seen there for now, and not on Radeon systems like yours or my repro system.

harrisonsz commented 1 month ago

Thank you for your reply. Do you have plans to also improve hipGraph on Radeon systems in the future?

schung-amd commented 1 month ago

Checking with the internal team to find out what our plans are on the Radeon front, I'll update here when I have that information.

schung-amd commented 4 weeks ago

We have plans for hipGraph performance improvement in general, but nothing targeting specific architectures, so how much of this improvement will be seen on Radeon systems is unknown at this time. I expect hipGraph performance on Radeon to improve in the future, although not necessarily as fast or as much as on MI cards.