ROCm / ROCm-OpenCL-Runtime

ROCm OpenOpenCL Runtime
169 stars 60 forks source link

(Planned) Support for SPIR and/or SPIR-V? #57

Closed j-stephan closed 4 years ago

j-stephan commented 6 years ago

I am currently evaluating different approaches to / frameworks for GPGPU programming in the context of HPC - namely SYCL, ROCm (HC / C++AMP and HIP) and CUDA. In order to test SYCL (via Codeplay's ComputeCpp) the OpenCL runtime needs to support SPIR or SPIR-V. I've read in #31 that there is currently no SPIR-V support, and from what I've gathered on my fresh ROCm installation there is no support for SPIR either. Are there any plans to include SPIR, SPIR-V or both in the near future?

Edit: The GPU I'm performing the tests on is a Vega 64, if that matters.

gstoner commented 6 years ago

We talked to Codeplay about support our native compiler directly so they would have more control over out output and optimization Phase of SCYL. Right now SYCL is only supplied by Codeplay as binary compiler. The other Opensource project for SYCL was based on early Kalamer/C++ compiler when person developing was at AMD.

https://llvm.org/docs/AMDGPUUsage.html.

One thing we working on update for HIP which leverage Standard CLANG frontend not HCC, based on the Work Google did with CUDA CLANG.

Greg

On Aug 20, 2018, at 10:54 AM, Jan Stephan notifications@github.com<mailto:notifications@github.com> wrote:

I am currently evaluating different approaches / frameworks to GPGPU programming in the context of HPC - namely SYCL, ROCm (HC / C++AMP and HIP) and CUDA. In order to test SYCL (via Codeplay's ComputeCpp) the OpenCL runtime needs to support SPIR or SPIR-V. I've read in #31https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/31 that there is currently no SPIR-V support, and from what I've gathered on my fresh ROCm installation there is no support for SPIR either. Are there any plans to include SPIR, SPIR-V or both in the near future?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/issues/57, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AD8DuRy5uMQYS1c8LuhWa32m4-GW7F5uks5uStutgaJpZM4WERW5.

j-stephan commented 6 years ago

Hm, so no SPIR(-V) support in the foreseeable future? And I should probably ask Codeplay about their timeline for HCC support?

One thing we working on update for HIP which leverage Standard CLANG frontend not HCC, based on the Work Google did with CUDA CLANG.

Interesting. Is there an estimate of when the HIP -> clang frontend will be available?

gstoner commented 6 years ago

We are looking at it now that SPIR-V for Compute is maturing, SPIR1.2 was a little less then optimial. We used on few projects, it was eaiser just to use LLVM compiler

MathiasMagnus commented 6 years ago

@zpodlovics Thanks for the cross-reference.

@gstoner I understand the position of SPIR how difficult it is to generate from modern Clang front-ends (which pretty much the entire world uses). Nonetheless, it is the only standard input to OpenCL 1.2 which SYCL 1.2 builds atop. SPIR-V is a much better alternative, but no OpenCL 1.2 runtime is mandated to take it as input. Even if ROCm OpenCL runtime supported digesting SPIR-V as binary, I could still not take my app compiled to a binary using ComputeCpp and run it on an Intel runtime with confidence it will work. SPIR-V will be awesome when SYCL 2.2 hits, or when OpenCL Next hits which maybe vendors will be less reluctant to support. SPIR-V as input is only standard as of OpenCL 2.1, which is a fairly important side note when using pre-compiled kernels as is the case with single-source C++ GPGPU APIs.

ps. @gstoner: Your profile pic rocks! I am much familiar with the background. :)

j-stephan commented 5 years ago

Mh, any news on this? I hoped for some news regarding SPIR(-V) with ROCm 2.0 but sadly nothing (at least clinfo doesn't report anything).

MathiasMagnus commented 5 years ago

Aye, me too. It's cl_amd_assembly only, no SPIR or SPIR-V it seems. Would be nice to hear how far down/up are the Khronos IRs on the back-log.

VileLasagna commented 5 years ago

Well, just thought I'd try SYCL out as it looks very promising when, I suddenly bump into this thing where my Vega apparently doesn't support this? I won't lie that this it is quite disappointing to see this apparent disregard for an open standard from the AMD side

j-stephan commented 5 years ago

@VileLasagna You could try to use Pocl's OpenCL implementation (which has experimental SPIR-V support) as backend. I haven't tried that myself as it would contradict the purpose of my current work, but it might help you.

VileLasagna commented 5 years ago

@j-stephan Hey, I'll check it out sometime, thanks for the hint. For now I'm giving hipSYCL a go instead. It took me a bit of fiddling about to get it working but it's looking good for now.

Lucretia commented 4 years ago

Any movement on this for SPIR-V?

Degerz commented 4 years ago

How about no ? AMD sees OpenCL as a joke in itself that needs to be depreciated so why not use CUDA/HIP instead ?

Khronos Group and their standards can't be trusted anymore to be sane ...

Lucretia commented 4 years ago

Because they should implement it as well to be conformant. If they think CL is a joke, why bother implementing it at all? @Degerz Do you work at AMD? How do you know this?

Degerz commented 4 years ago

I wouldn't know why AMD bothered as I don't work for them but all I know is that judging from this repository's activity, AMD's ROCm OpenCL implementation is dead ...

I imagine AMD only wanted to gauge how much community support there actually is for OpenCL to see if the community would accept the burden of maintaining their new OpenCL stack but that turned out not pass in the end ...

If Apple couldn't beat Nvidia into submission then not even Intel will be able to either. I think it is time for everyone to start refocusing their efforts on CUDA/HIP or other viable alternatives with similar programming models because OpenCL has no future ...

jeffhammond commented 4 years ago

@Degerz I encourage you to find another place to rant about things that have nothing to do with this thread. Your comments are interfering with useful activities.

Lucretia commented 4 years ago

Would be nice to get an actual response from @amd people about this, let's try @fxkamd

Degerz commented 4 years ago

@jeffhammond How about the vocal minority quit asking about OpenCL ? It's not like anything useful is going to happen ...

People need to deal with the fact and come to term that this repository is dead ...

j-stephan commented 4 years ago

This issue is about the absence of SPIR(-V). I'd be thankful if you'd open a separate issue for reasoning about AMD's overall OpenCL strategy and other facts people need to deal with.

Degerz commented 4 years ago

If it's about SPIR-V then they have no intention to add it. :sunglasses:

bensander commented 4 years ago

Hi, the is Ben Sander - I work at AMD. We are committed to supporting OpenCL, along with HIP and OpenMP. These have different benefits in terms of portability, programmability, and familarity - users can choose based on the needs of their project. For intermediate language, we are currently focusing on direct-to-ISA compilation w/o an intervening IR - it's just LLVMIR to GCN ISA. This has some benefits (single compiler, easy to expose all hardware capabilities), and the downsides (limited portability) is an appropriate tradeoff for the HPC and open-source markets we currently focus on. Future work could include SPIRV support if we address other markets but not currently in the plans.

jeffhammond commented 4 years ago

SPIR-V gets you support for industry-supported SYCL compilers, including CodePlay ComputeCpp.

camaclean commented 4 years ago

SYCL for HPC is my primary reason for wanting SPIR-V support. With hardware diversification expected as Moor's Law comes to an end, an open standard like SYCL is quite appealing. I see HIP as a shim to get exist CUDA applications to work on AMD hardware but if I were to write a new application or add accelerator offloading to an existing application I would want it to be in a language or at least a framework capable of adapting to a post-Moor future and a proliferation of diverse accelerators. It's not just a matter of SYCL, either. OpenMP capable compilers could potentially emit SPIR-V. C++ compilers could emit SPIR-V for C++ parallel algorithms library offload. There's a bunch of things I can see independent compiler developers could do with SPIR-V if hardware vendors would only implement it.

Lucretia commented 4 years ago

There's a few of us looking at how to adapt Ada to emit SPIR-V too. I'm on AMD hw for open source reasons and not to have to put up with nvidia, but I'm getting a bit sick of AMD's half arsed attempts at OpenCL.

Degerz commented 4 years ago

@Lucretia HIP/ROCm is open source since the code is there for everyone to see but it doesn't mean that AMD's standards or it's technology have to be cross-vendor as well ...

It is also in AMD's interests to generate GCN ISA and not SPIR-V since that is the abstraction that best matches their hardware the closest so burdening AMD by asking them to support SPIR-V is not acceptable to the compiler team because it just introduces more complexity to them ...

SPIR/SPIR-V support is rubbish for sacrificing maximum performance, abstraction in the name of 'portability' and AMD shouldn't have any of it because then they'd just be inviting more competition from Intel ...

If Intel wants to push SPIR-V then they can do it all by themselves but they shouldn't go around pressuring the other vendors to open up seeing as how they are possibly doing this in bad faith so they can gain leverage in a market where they have no stake in. Intel's move is most certainly politically motivated and potentially nefarious as well ...

How's that for 'portability' if only one vendor supports the standard in the end ? It's pretty clear that neither AMD nor Nvidia want anyone benefiting from their work so different interests will start to breakdown between different parties ...

Lucretia commented 4 years ago

So, the only way OpenCL 2.1/2 will happen is if the people who want it, either:

1) Extend the ROCm sources to implement the missing functionality. 2) Port Intel's as a replacement for both Mesa's shockingly bad OpenCL 1.1 (for AMD), the AMDGPU-Pro one and this one.

camaclean commented 4 years ago

HPC benefits heavily from repurposing efforts in industry. The whole GPGPU concept owes its existence to graphics cards and repurposing a technology that would have been cost prohibitive if the only reason to design them was scientific computing. While we have the ability to recompile our applications to target our specific hardware, this is a very niche way of running software and library development will be limited to that niche.

Imagine a library that can be compiled to use either SPIR-V, GCN, and PTX. In HPC we would obviously compile it as best for our hardware. However, the library might not even exist if SPIR-V wasn't an option because the original developer required binary portability. SPIR-V could enable the development of things that we won't have the resources to do on our own.

Degerz commented 4 years ago

@Lucretia Precisely and I dare the community to do AMD's work for them if they so desire an OpenCL/SPIR-V stack so much ...

@camaclean

this is a very niche way of running software and library development will be limited to that niche

This statement isn't reflected in reality given CUDA's iron fist dominance ...

However, the library might not even exist if SPIR-V wasn't an option because the original developer required binary portability.

Binary portability is overrated! If it truly mattered that much then the GPGPU industry would've already converged on a programming model but in the real world it is divergent programming models that are thriving like CUDA and to a much lesser extent this applies to HIP as well since AMD are seeing far more success with their efforts poured into it compared to OpenCL ...

A unified industry standard only works when everyone seeks to participate in good faith like we see with a graphics API such as Vulkan but it doesn't work when factionism interferes with establishing a successful cartel for standards such as OpenCL when one player is acting in bad faith (Nvidia), the other doesn't really care (AMD) so that really only leaves the single player determined enough (Intel) to carry it ...

It was at that point that AMD realized that OpenCL was a dead end because it's not what the majority of the GPGPU community wanted and there was no cooperating with Nvidia to successfully produce an industry wide supported standard so it made no sense to keep investing in a standard where virtually no one wanted it ...

camaclean commented 4 years ago

@Degerz

I was more referring to software overall, not just HPC applications. As computers in general get more heterogeneous as Moor's law ends (like how the iPhone has a whole bunch of specialized silicon), then programming in general could see a lot more accelerator offloading. If commercial software developers can tell std::for_each to offload to an accelerator, even the integrated graphics of a desktop, then the number of applications capable of using an accelerator could increase due to the lower barrier of entry (even if they don't get as close to peak performance as an experienced GPU developer). Right now GPU applications are written with the intent of writing something for the GPU, but if even suboptimal acceleration could be added later by simply adding a flag to for_each then how many more applications would use them?

GPU programming models have been divergent due to politics, too. It is quite unfortunate that the competition in the industry has been this way. I see OpenCL as dead, too. Part of the problem there was that when OpenCL was designed, nobody had experience with how best to program a GPU or even if people would be writing for GPUs directly or using libraries. SYCL moves close to CUDA, being single source C++.

Degerz commented 4 years ago

@camaclean

Are you implying that heterogeneous programming models should compel the industry to adopt portable solutions ? If so then I would argue that it is the exact opposite since lower level access is more valuable in the case of specialized accelerators and Moore's Law ending only makes an even stronger case for different APIs with specialized programming models for targeting different accelerators out there to maximize performance ...

These following three reasons alone make SYCL/SPIR-V an inappropriate solution for the GPGPU industry:

  1. Introducing more abstractions to tool maintainers such as AMD/Nvidia is not useful to them since it creates friction.
  2. Performance is valued in this sector so going lower level is more valuable as well which usually necessitates different programming models.
  3. Portability isn't much of a concern here seeing as how CUDA/HIP has tons of portability problems even when dealing with hardware designs from even their own vendors. (depreciating warp synchronous programming/hardware targets/other features)

GPU programming models aren't just divergent because of politics but hardware design plays a role as well. OpenCL's model was doomed from the start at taking a graphics like approach to developing the API. I'm convinced that there's absolutely no way to have a powerful programming model like CUDA and be portable as well. Even Intel has started adding vendor specific extensions to make SYCL somewhat palatable but at that point you lose the argument in favour of portability because of implementation vendor specific details now matter.

Not even the likes of Apple (who were the original authors of OpenCL) will attempt BOTH the single-source model AND portability because even they know that different hardware details get in the way of doing this in a high-performance manner. It would be a massive ego boost for themselves to figure out a portable CUDA alternative because then they'd be able to easily have a massive simplification in terms of their programming model and their developer ecosystem would become FAR more productive for everything ... (this is why just about every graphics API out there is separate-source to ensure maximum portability)

SYCL looks like this halfway compromise and seeing as how AMD are struggling to develop libraries for HIP, I don't think they're all that interested in dealing with the potential horror of hitting a SPIR-V compiler bug ...

camaclean commented 4 years ago

@Degerz

The advantage of something like SYCL is that you could have a mykernel<nvidia>, mykernel<amd>, mykernel<xilinx>, etc. with those architecture-specific optimizations where needed. Maybe there just needs to be a way of compiling all of these into one binary rather than something generic like SPIR-V, but it does seem like it would be good to have a generic fallback. SPIR-V isn't strictly necessary for having this sort of programming model but I look at how LLVM has been used as this common platform for building front-ends and back-ends without having to write an entire compiler from scratch and contemplate if a standard IR could do the same for accelerator programming. I don't think that divergence and convergence are mutually exclusive as circumstances can vary, too. Allowing offloading from additional languages could happen at the same time as languages capable of targeting many hardware vendors develop.

For accelerators like video decoding offload, yeah, those require specific APIs. I'm more considering data flow architectures and other non-von Neumann devices that are different from a CPU or GPU but aren't fixed-function.

I'd like to see hardware-specific optimization to be at the lowest level of granularity. If I need a calculate and calculate to be written differently, that's fine, but I should be able to target any vendor without having to use an entirely different API just to do the kernel launching. That could happen with vendors agreeing to a language standard or an IR standard (with 3rd parties developing the language). I don't really see 3rd party compiler developers targeting tons of vendor back-ends. I don't really care if its a common language or common IR that's the solution, but vendors have been so insular when it comes to languages I just want something.

There is also a decent amount of OpenACC/OpenMP usage in HPC where people really aren't using the low level features. They may need to adjust parameters for different devices but it's not bare metal programming. Directive based programming can't see the lower level optimizations possible but it's often considered good enough. The answer to if accelerator offloading is working for us isn't so much a question of how close to peak performance are we getting but if it can beat a CPU.

Degerz commented 4 years ago

@camaclean

That could happen with vendors agreeing to a language standard or an IR standard

This will never happen. Nvidia especially doesn't like compromising when it comes to their compute stack advantage and AMD is heading to a similar path as well. The compute community will never see any compromises compared to the graphics community ...

After 3 different OpenCL stacks (ORCA,/PAL/ROCm) and two different standards (OpenCL/HSA) it was miraculous how AMD didn't call it quits much earlier. AMD made the right call IMO when no one else in the HSA Foundation cared about their vision so they knew it was a futile exercise to even try making an industry standard which is why they forked their work on HSA to ROCm/HIP ...

Intel looks like very much idealized like AMD previously was before the latter felt the despair of reality so I wonder just how soon the former will break too ...

MathiasMagnus commented 4 years ago

Hi @bensander, thank you for letting us know of the current priorities. It helps the end users prioritize their future work and plan ahead. Subtracting the frustration factor from some of the comments, there is a lot of truth to many things mentioned.

IMHO, HIP is a way for AMD to ride CUDA's success. Nothing more, nothing less. Standards exist for a reason and we need them. I am grateful for AMD making dual-boot rad again.

Whenever I want to run SYCL code, I have to boot a ~4 year old linux distro (Ubuntu 16.04) with a 2+ year old driver (AMDGPU-PRO 17.40). This is what "commitment to supporting OpenCL" feels like to end-users without SPIR.

(NOTE: I don't have the luxury to revert to Radeon Software 17.5 on my daily Windows devbox, because the games I play simply crash. I can wipe my Windows clean of AMD drivers twice on a daily basis or dual-boot.)

It is true that binary portability is not the highest priority in HPC, but I do agree with @camaclean that many technologies would simply not exist without portable IRs. My main motivation is also SYCL, and SYCL in particular is steering towards "making OpenCL optional" (read: ditches OpenCL as a mandatory back-end) for the very reason that AMD doesn't implement SPIR/SPIR-V support. Yes, OpenCL as a target platform didn't quite reach the adoption we hoped it would, but it is mighty useful infrastructure. So much extra work is required if standard IRs are not a priority.

Imagine not having LLVM? Every front-end would need to be able to generate code for every back-end. Today HPC struggles with the same thing! Every front-end has to generate code for every vendor, because vendors fail to agree upon a common IR. To target Intel platforms, I need a proper conforming SYCL compiler (ComputeCpp). To target AMD HW, I need HipSYCL (still not production quality, Linux-only). To target Nvidia platforms, I need non-standard extensions (PTX back-end of ComputeCpp) or SyclGTX... it's a nightmare!

All because AMD doesn't have SPIR.

If it did, we could go back to the good old days, when SYCL runs on Intel & AMD, and the Codeplay folks going the extra mile to reach 80% of the community by baking the non-standard PTX back-end. Everyone would be happy. We could still point fingers on just one company ignoring standards and the rest be good guys. Things would work, AMD & Intel could finally join forces to bake the ecosystem and it would not all rest on AMD's shoulders like in the clMath days.

Sadly, all this remains to be a utopia, we'll need 4-5-6 projects to be able to compile our code on all platform (wasted human lives on duplicate efforts). SyclDX12, SyclMetal, clvk/clspv, SyclGTX, HipSYCL...

Degerz commented 4 years ago

The ROCm OpenCL driver has been deprecated as of ROCm 2.10 and will not be maintained anymore in the future ...

I guess this also means that they finally killed the idea of submitting the implementation to Khronos for conformance resting ...

Lucretia commented 4 years ago

The ROCm OpenCL driver has been deprecated as of ROCm 2.10 and will not be maintained anymore in the future ...

Where does it say that from AMD?

Degerz commented 4 years ago

Where does it say that from AMD?

Right here in the official ROCm repository ...

I recommend everyone move on to HIP or refactor your project to use Vulkan compute shaders because I doubt AMD's PAL OpenCL drivers will last much longer ... (I give it less than 3 years before AMD officially drops OpenCL support in every conceivable way)

AlexeySachkov commented 4 years ago

Where does it say that from AMD?

Right here in the official ROCm repository ...

I might be wrong, but if I understand correctly, ROCm-OpenCL-Driver is not an OpenCL device compiler or OpenCL runtime - this is just an adapter library to connect the runtime and compiler. Briefly looking at ROCm-CompilerSupport (stated as a replacement) it seems to do the same.

Note: ROCm-OpenCL-Runtime is not deprecated and the latest release is tagged 4 days ago

emankov commented 4 years ago

Being the author of ROCm-OpenCL-Driver I may say that it is the only component (which is actually an AMD OpenCL Compiler Driver as it is written in its doc) of AMD's OpenCL stack which is going to be deprecated. The reason for deprecation of that particular component is that its functionality was rewritten in a new COMgr component, which has its own API and which is used not only for driving OpenCL compilations.