Open YellowOnion opened 8 years ago
Great!
I think what is missing at the moment is:
cuda
package which is a bunch of FFI bindings to the low-level CUDA driver to allocate memory, transfer data, launch a kernel, etc. If that doesn't already exist somewhere that is probably the biggest missing piece. (actually, I'm not even sure what kind of API AMD uses to control its hardware.)Once we had those two things, getting an accelerate-llvm-gcn backend up and running should be relatively easy.
I've been doing some research and it doesn't look promising.
llvm code that is generated seems to quite low level e.g. for drivers on linux (which means not portable, and means you need to mess around below the mesa stack),
OpenCL takes another form of binary (possibly partly compatible).
And the only other option I've found is to use HSA-runtime to upload the kernels, but HSA is designed for AMDs APUs so I'm not sure how useful that is either.
The final option is to find some way to have LLVM spit out SPIR (OpenCLs bytecode) so that any OpenCL device can work.
The low-level nature of LLVM is fine; that's all handled by accelerate-llvm
. Note that an accelerate-llvm-gcn
(or whatever we call it) backend is generating code only for the kernel parts that are executed on the GPU, which is exactly what the accelerate-llvm-ptx
and accelerate-cuda
backends do. The current documentation for LLVM's AMDGPU target is anaemic, but otherwise not surprising compared to the NVPTX documentation. It fits in with how the PTX backend works.
I should probably mention that I expect to use no OpenCL at all. The accelerate-llvm-ptx
backend is so named because, despite targeting 'CUDA capable NVIDIA GPUs', does not actually generate CUDA code, and I'd expect the same to happen here as well. CUDA is an umbrella term that covers stuff executing on the GPU (which we are interested in) as well as control and coordination carried out by the host (which we are not; that gets taken over by the accelerate runtime). OpenCL is the multi-vendor equivalent, so we're interested in what OpenCL does once you tell it you are targeting an AMD GPU. (Does that help clarify the goal at all?)
HSA actually looks like a promising avenue. This example looks like it is showing how to launch the same "hello world" kernel as at the bottom of the LLVM AMDGPU documentation. That example is all wrapped in C++ (worryingly), but it looks like the actual hsa.h
interface is just regular C, which is fine (from a Haskell-FFI perspective).
I'd have to look closer at the other examples, and maybe a few others, but to me I think the next step is a Haskell FFI binding to the HSA Runtime API.
Yeah I had found that repository, but most documentation I've found showed them targeting APUs this pages Title even states that it only supports Kaveri & Carrizo APUs.
I'll try get something working on my Non-APU system, I guess that would be the best way to prove myself wrong.
It might be worthwhile shooting an email to AMD / LLVM mailing list or opening a github issue on that repo asking for advice / clarification / pointer to the correct documentation explaining how to use the LLVM AMD target.
I don't have a machine with an AMD card in it at the moment so I can't be much help trying things out, sorry.
Possibly useful links:
With MacOS (ROCm is Linux-only), I've been having problems getting amdgcn
to work at all.
If I compile an OpenCL kernel with Apple's own openclc
, like this:
/System/Library/Frameworks/OpenCL.framework/Libraries/openclc -c -emit-llvm -arch gpu_32 -o <output>.bc <input>.cl
I get a kernel which I can use with clCreateProgramWithBinary
.
However, clang-5.0.0
(from nixpkgs) with amdgcn
, invoked as follows:
clang -c -cl-std=CL1.2 -arch amdgcn -emit-llvm -Xclang -finclude-default-header -o <output>.bc <input>.cl
Produces an error like this with clCreateProgramWithBinary
:
[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:\nUnknown bitstream version!\n
Looking at a hex dump of the two files, they seem quite similar:
Apple:
00000000 de c0 17 0b 00 00 00 00 14 00 00 00 b0 04 00 00 |................|
00000010 ff ff ff ff 42 43 c0 de 21 0c 00 00 29 01 00 00 |....BC..!...)...|
...
AMDGCN:
00000000 de c0 17 0b 00 00 00 00 14 00 00 00 50 0c 00 00 |............P...|
00000010 ff ff ff ff 42 43 c0 de 35 14 00 00 05 00 00 00 |....BC..5.......|
...
So AFAICT, they appear to be "the same kind of thing". At least I'm not seeing two completely different set of magic numbers, etc.
Does anyone know if I'm missing something here? Has anyone come across instructions for running AMDGCN-compiled code on a Mac?
If you have an older version of LLVM available, it might be worth trying that?
I know the NVIDIA tools are also based off of LLVM, but typically lag by a few releases.
There's no cross-platform way to load GCN binaries other than OpenCL. (And no alternative of any kind on macOS!)
That's not to say that it'd be generating OpenCL code, but it has to work with the API a little to load the object files.
@typedrat thanks for the info!
I got an AMD GPU again and I got curious about this bug.
These have appeared:
https://github.com/RadeonOpenCompute/clang-ocl/blob/master/clang-ocl.in https://rocm.github.io/QuickStartOCL.html
This mentions no Windows support which leaves me out of options: https://github.com/RadeonOpenCompute/clang-ocl/issues/4
Unfortunately all of that is about the HSA stuff, which is the old name for ROCm, and is also Linux-only (and quite a pain to get working properly, from painful recent experience.) There's nothing to be done as it stands, AMD simply refuses to provide a workable target on other OSes.
clCreateProgramWithBinary
as discussed for macOS. This would either be easy or impossible: if the ABI is the same the repackaging is trivial and has libraries available that can already do it, and if it isn't the problem is immediately intractable.@typedrat wow thanks for the insight!
Well, I just bought a Radeon VII, so let's see what we can do.
This is a bit beyond my expertise, but in case it helps, I thought I would drop a mention of https://github.com/google/clspv.
This project apparently provides LLVM modules for targeting Vulkan compute shaders. I could be entirely wrong but this sounds like it could be the lynch pin of an accelerate-llvm-vulkan, with broad device support.
Sorry if this isn't helpful. I don't know enough to know if not.
@gozzarda oh nice find, thanks!
Hey, I posted on the User group a few weeks back about AMD/OpenCL support, but this was suggested as an enhancement, I would love to attempt to get something working, could someone point me in the right direction on getting started?