"Yamata No Orochi"[1]
Orochi is a library loading HIP and CUDA APIs dynamically, allowing the user to switch APIs at runtime. Therefore you don't need to compile two separate implementations for each API. This allows you to compile and maintain a single binary that can run on both AMD and NVIDIA GPUs. Unlike HIP, which uses hipamd or CUDA at compile-time, Orochi will dynamically load the corresponding HIP/CUDA shared libraries depending on your platform. In other words, it combines the functionality offered by HIPEW and CUEW into a single library.
graph LR
A(User code using <br/> driver API) --> B(Orochi)
B --> |AMD GPU|C(HIP <br/> amdhip64.dll)
B --> |NVIDIA GPU| D(CUDA <br/> nvcuda.dll)
In order to enable the CUDA backend you need to:
OROCHI_ENABLE_CUEW
to your projectFor the HIP backend: it's easier as everything needed is embedded in this project, thus you don't need to do anything.
This library doesn't require you to link to CUDA nor HIP at build-time. This provides the benefit that the runtime works even if one of the two drivers, either CUDA or HIP, is installed.
To run an application compiled with Orochi, you need to install a driver of your choice with the corresponding .dll/.so files based on the GPU(s) available. Orochi will automatically link with the corresponding shared library at runtime.
Orochi will aim to maintain support for various version combinations, enabling developers to switch between branches depending on their environment.
Combinations currently supported: | HIP version | CUDA version | Branch | Remarks |
---|---|---|---|---|
5.7 | 12.2 | release/hip5.7_cuda12.2 | - Tested and validated. - Synchronized with the main branch.- Use this one for compatibility with a wider range of machines. |
|
6.0 | 12.2 | release/hip6.0_cuda12.2 | - Tested and validated. - HIP Windows DLLs not provided yet. - Use this one to take advantage of the latest HIP features and if you are on machine with HIP 6 driver. |
If you need a combination that is currently not supported, open an Issue.
APIs have prefix oro
. If you are familiar with CUDA or HIP driver APIs, you will get used to Orochi APIs easily.
For example, suppose we have the following HIP code for device and context creation:
#include <hip/hip_runtime.h>
hipInit( 0 );
hipDevice device;
hipDeviceGet( &device, 0 );
hipCtx ctx;
hipCtxCreate( &ctx, 0, device );
The same code can be rewritten using Orochi as:
#include <Orochi/Orochi.h>
oroInitialize( ORO_API_HIP, 0 );
oroInit( 0 );
oroDevice device;
oroDeviceGet( &device, 0 );
oroCtx ctx;
oroCtxCreate( &ctx, 0, device );
Which will run on both CUDA and HIP at runtime!
See more in the sample application.
Run premake.
./tools/premake5/win/premake5.exe vs2022
Note: add the option --precompiled
to enable precompiled bitcode
Test is a minimum application.
The test applications run on HIP by default. If you want to run on CUDA, run the app with an arg cuda
.
The source code for the test applications can be found here.
Feel free to open pull requests.
You can either target a specific release/
branch or the main
branch.
If it makes sense, your commit will then be propagated in the different release/
branches by us.
[1] Yamata no Orochi (ヤマタノオロチ, 八岐大蛇) is a legendary eight-headed and eight-tailed Japanese dragon.