ROCm / HIP-CPU

An implementation of HIP that works on CPUs, across OSes.
MIT License
112 stars 19 forks source link

Wtf is this for? #1

Closed ghost closed 3 years ago

ghost commented 3 years ago

I don't understand why tf this exists? What's the point, we (non-AMD employees) target HIP-CPU code carefully and it works on GPUs? Or what? What's the benefit for the user?

lanwatch commented 3 years ago

I have some HIP/CUDA-only code that I would like to easily run on CPU, even if just for debug. I find this interesting and will give it a go.

I am thankful to AMD and @AlexVlx for putting this out.

MathiasMagnus commented 3 years ago

I would suggest enabling GitHub discussions for this project to omit such carefully crafted feedback polluting the issues section.

@procedural As it's been said, it's mighty useful to run your kernels through the host compiler's debugger, or generally make use of your "idle" CPU resources while GPU does work.

(one nifty thing that's possible is turning on MSVC's structured exception handling inside kernels to catch those nasty NaN errors and have a language exception be raised on the first floating-point exception. Be notified of the very first NaN and trigger a breakpoint. Hope this answers your question: "Wtf is this for?") image

jlgreathouse commented 3 years ago

Hi @procedural,

Thank you for the question. This is actually a really important thing to ask, though I agree with @MathiasMagnus that we should probably have a GitHub discussions section so that our issue tracker doesn't become a discussion forum.

AMD has a few reasons for creating HIP-CPU. Depending on your goals as a developer, these may help answer your question, "what's the benefit for the user?"

  1. HIP-CPU is a path for developers to write SIMT code, in a well-known and commonly used syntax, that runs on CPUs.
  2. HIP-CPU is a way for HIP developers -- who are perhaps GPU developers -- to write, run, and ship code on their system of choice, regardless of hardware availability or driver support.
  3. HIP-CPU allows HIP developers -- who are perhaps GPU developers -- to use the range of software development and debug tools that are available for CPU applications.

Let's go a bit deeper into each of these, and why you as a CPU or GPU developer may care about each one:

1. HIP-CPU is SIMT Development for CPUs

Your initial question asks whether non-AMD employees should target HIP-CPU code so that it works on GPUs. HIP-CPU does not just exist to help code run on GPUs. In fact, one of our major reasons for releasing this was to show that SIMT programming models also work for CPUs. and that they can be translated into standard C++ SPMD parallelism constructs.

Why might you care about this? There are lots of of parallelism mechanisms for CPUs, from low-level things like pthreads and C++ std::thread to pragma-driven mechanisms like OpenMP and explicit parallelization frameworks like MPI.

All of these methods have their tradeoffs: some area easier to use, while others may offer more control and performance. A developer's choice in parallel language depends on a variety of tradeoffs, including things like familiarity of syntax and availability of libraries and tools. We think that HIP brings something new to the table for CPU programmers in this domain.

First off: HIP's syntax is very close to the syntax of common existing GPU programming languages. This means that there is a non-negligible number of developers that are intimately familiar with the HIP syntax. Those developers were previously stuck only programming for GPUs. So any skills a developer gained programming GPUs were harder to transfer to the rest of their applications. HIP-CPU lets you use the exact same syntax and APIs to write high-performance code that will run on your CPU.

Secondly: because HIP-CPU uses the HIP syntax, it already has a number of libraries and applications that may now work on the CPU and run quite well. For example, we have run many available open source HIP applications through HIP-CPU in the course of developing it. We found that many (but not all) of these applications performed quite well because they were automatically able to take advantage of all of the CPU cores available on the system. This means two things: 1) using HIP-CPU may gain you access to already written high-performance parallel codes, and 2) HIP-CPU may allow some high-performance codes that were originally written for GPUs to also perform well on CPU-only systems.

Essentially, because HIP-CPU is a header-only library that is built on top of standard C++ parallelism constructs, it is both a clean way for HIP-GPU programmers to write high-performance CPU code in a language they are comfortable with as well as a mechanism to performantly run existing HIP code on CPUs.

2. Developing HIP Code on and for Your Platform of Choice

I described above how some developers may want to use HIP-CPU as a path for accelerating their code on CPUs because it offers familiar SIMT syntax and compatibility with codes originally written for GPUs. But another use of HIP-CPU is to help developers write HIP code on and for their platform of choice.

In the GPU world, HIP runs on AMD GPUs and Nvidia GPUs. Support for HIP on AMD GPUs is currently limited to GPU cards that are supported by ROCm, which itself is limited to running on x86-64 Linux at this time. As such, there are a limited number of hardware and software platforms where someone can develop (and run) HIP at this time. AMD GPUs running on Windows do not currently support HIP, nor do any GPUs running on Mac OSX. What if your system doesn't have a HIP-capable GPU (e.g. you are running on a headless server that only has the BMC-based VGA controller)?

HIP-CPU is a mechanism that allows developers to write and test HIP code without any of these limitations. Because its only requirement is a C++17 compliant compiler toolchain, you can write, run, and test HIP code on any system with such a toolchain. C++17 parallel code can run on all manner of operating systems (e.g., Linux, Mac, Windows, BSD, etc.). And C++17 parallel code can run on all manner of CPU architectures (x86, ARM, RISC-V, Power, etc.) So since HIP-CPU is built on top of C++17 parallel code, it should also run in all of these places. AMD has not tested all permutations of these configurations, but we are happy to receive community feedback and patches to help harden HIP-CPU to run in as many environments as possible.

This is beneficial to developers for a few reasons: 1) they can develop HIP code for their GPUs on their own system, regardless of the hardware and software on that system; 2) the HIP code they develop could portably compile and run on their users' target system, regardless of the available hardware and software.

The first bullet makes sense for making the development environment as comfortable and low-impedance as possible. But I think the second bullet is also important. Like I described above in the section about SIMT programming for CPUs, the fact that HIP-CPU is not constrained to any particular software or hardware setup means that code written for HIP-CPU -- just like well-written C++ code -- is portable. This means that it is also a path to get high-performance parallel algorithms running on CPUs from a wide variety of systems. HIP-CPU is not just a platform for writing code that will eventually run on a GPU!

3. Using Your Favorite CPU Software Development Tools

If you're a GPU developer, you are probably aware that GPU software development tools are not quite as powerful as those on CPUs. CPU software development tools -- debuggers, profilers, static and dynamic analysis tools, IDEs -- have had decades to gain features and work through bugs. GPU tools have had less time, and so they are not quite a powerful. In AMD's ROCm software stack, we have profilers and debuggers, but you may not be able to use your favorite tooling to help develop HIP or GPU kernels. For example, we currently do not support Address Sanitizer on our AMD GPUs, nor could you run something like rr.

Even if you're developing HIP to run on a GPU, the development tools available on the CPU may be a reason you also run your code within HIP-CPU. As @MathiasMagnus demonstrated above, HIP-CPU may give you access to development tools that help you track down hard-to-find problems in your code. We definitely hadn't thought of MSVC's structured exceptions when we were building HIP-CPU, so I consider this a big win for "HIP-CPU is normal C++17 parallel code". Because we aren't doing anything tricky in HIP-CPU, standard CPU development tools should work out of the box. :)

AlexVlx commented 3 years ago

Thank you for the interesting inquiry @procedural. Thank you @MathiasMagnus and @jlgreathouse for clarifying. I would say that probably is sufficient clarification for the original question, therefore I'm going to close this - feel free to follow-up on the same topic via GitHub Discussions, which will be enabled SoonTM.