utterances-bot commented 2 years ago

GPGPU, ML Inference, and Vulkan Compute | Lei.Chat()

Vulkan (compute) has the potential to be the next-generation GPGPU standard for various GPUs to support various domains; one immediate compelling application, is machine learning inference for resource-constrained scenarios like in mobile/edge devices and for gaming. This blog post explains the technical and business aspects behind and discusses the challenges and status.

https://www.lei.chat/posts/gpgpu-ml-inference-and-vulkan-compute/

antiagainst commented 2 years ago

Hello there, Thanks for reading! If you'd like to leave a comment, you can do it directly here (which will require authorizing utterances), or you can directly comment on the backing GitHub issue.

qianjiaqiang commented 2 years ago

Not like CUDA/OpenCL , it's hard to find books that introduce compute shaders except GLSL lang spec.

antiagainst commented 2 years ago

@qianjiaqiang: That is very true. It's largely an ecosystem issue. Using Vulkan for compute/ML is just getting started, so there is no much dedicated resources; while CUDA/OpenCL has existed for more than a decade, so lots of great knowlege sharing. Existing usages of Vulkan compute still mostly come from rendering. There we program with compute shaders. But compute shaders aren't a primarly thing to focus if discussing about rendering. (Although now it become more and more popular to perform more rendering tasks in pure compute.)

But for ML though, the programming model is more domain specific and comes from AI frameworks. So in a sense it matters less as normal users won't program with CUDA/GLSL/etc. at all. Framework/compiler developers might care for sure, and for them reading specs aren't that bad. :) Also for Vulkan, GLSL is not the only choice of frontend programming langage anymore; HLSL nowadays basically have first party support. Anything that can compile to SPIR-V actually will do. For example, IREE directly compiles TensorFlow models into SPIR-V.

monkeyking commented 2 years ago

what das predictable exactly mean in

Both OpenGL and OpenCL feature a thick driver stack (including full compilers for kernels/shaders and implicit runtime status tracking and validation) and do not have strong conformance test suites. That results in different bugs or inconsistent performance among different devices, and unpredictable performance on the same device.

monkeyking commented 2 years ago

i may get ur mean will test it

antiagainst commented 2 years ago

@monkeyking: I meant how much performance (e.g., latency) variation we are expecting for similar workloads; particularly due to driver stack overhead. It's quite important for real time and resource constrained scenarios.

"Novel Methodologies for Predictable CPU-To-GPU Command Offloading" is a nice paper that offers great analysis on this front. It compares submission and execution predictability among CUDA, OpenCL, and Vulkan. It also dives into the driver to explain the reasons. You might want to check it out. :)

monkeyking commented 2 years ago

nice paper!

antiagainst / antiagainst.github.io

GPGPU, ML Inference, and Vulkan Compute | Lei.Chat() #8

GPGPU, ML Inference, and Vulkan Compute | Lei.Chat()