ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.71k stars 528 forks source link

[Feature]: Out of Order Streams #3454

Closed hdelan closed 5 months ago

hdelan commented 5 months ago

Suggestion Description

It is my understanding that hipStream_ts are implemented on top of out of order HSA queues. I wish to use out-of-order execution of kernels in HIP, and to do so I must use multiple hipStream_ts. Since the in-order-ness of hipStream_ts are emulated from out of order HSA queues, there is a lot of scheduling in the HIP runtime that I do not want or need for my application.

It would be great if HIP offered an out of order stream extension, which mapped closely to the out of order HSA queues. Using this extension I could avoid a lot of HIP runtime scheduling overhead, especially the overhead that is involved with creating many hipStream_ts.

Operating System

Ubuntu

GPU

MI200

ROCm Component

HIP

cjatin commented 5 months ago

Hey Hugh,

Are you looking for something like this: https://rocm.docs.amd.com/projects/hipfort/en/docs-6.0.0/doxygen/html/interfacehipfort_1_1hipextlaunchkernel.html#:~:text=associated%20synchronization%20rules.-,%5Bin%5D,of%20hipExtAnyOrderLaunch%2C%20signifies%20if%20kernel%20can%20be%20launched%20in%20any%20order.,-Returns

hdelan commented 5 months ago

Hi @cjatin thanks for the link! That looks exactly like what we are looking for. Is there some documentation on how this works. ie does recording HIP events before and after still work in the usual way?

cjatin commented 5 months ago

How it works: It only works on Linux AFAIK. It basically does not set the HSA's packet header's barrier bit. Which allows current task to run without waiting for preceding tasks to complete (in the same queue).

Regarding event record, it should not be affected. Although if you do see any issues please let us know.

We do not have a test/proper doc at the moment for this feature, will create a task internally to follow up on this.

hdelan commented 5 months ago

@cjatin I see the API is a bit different as it takes start and end events as well. No further info needed. This is exactly what we want. Thanks for the link!