Open hgkamath opened 11 months ago
The SPIR-V we generate is in principle compatible with any valid SPIR-V compute environment. For example, we could also use it to target OpenCL implementations that support SPIR-V (and unified shared memory) if there is a use case.
Vulkan uses a different SPIR-V dialect, namely the SPIR-V shader model. It is not directly compatible with the compute model that OpenCL uses. Generating shader SPIR-V is in principle possible to some extent (see e.g. the Sylkan project) but non-trivial and some SYCL functionality may not work.
Kepler-era GPUs additionally are so old that they don't really support many core SYCL features well, such as unified shared memory. Even on Linux with the CUDA backend, there are already limitations.
This way CUDA drivers don't need to be present on either OS., and one does not have to code in CUDA-nvcc-lang.
We already do not require nvcc.
n it, it seems like the below path is documented as only for intel-GPU clang-based-flow -> input sycl code -> cylang-sycl-pass-experimental -> SPIR-V -> output binary
Our production SPIR-V/Intel support does not go through any clang experimental SYCL passes, but through our own generic single-pass compiler (--opensycl-targets=generic
).
Firstly, thank you for your reply and to developers who contribute to AdaptiveCpp.
The below are just some collected info regarding Kepler and unified-shared-memory.
This llvm bug [1] seems to imply unified-shared-memory is possible and already does exist for kepler (sm_30, sm_35, sm_37), and also for fermi (sm_20).
There is the chance that this spirv solution might work on Linux.
For the windows driver situation, the chances seems similar.
Its also okay, to do, if possible, a restricted form of coding that avoids program-patterns whose spirv output cannot be executed.
On windows, vulkaninfo (full text file is attached in [7]) has the following info.
The 2GB GPU memory does have VK_MEMORY_HEAP_DEVICE_LOCAL_BIT (suggested to check in [5])
:
maxComputeSharedMemorySize = 0xc000
:
:
VkPhysicalDeviceMemoryProperties:
=================================
memoryHeapCount = 2
memoryHeaps[0] :
size = 2107179008 (0x7d990000) (1.96 GiB)
flags:
VK_MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryHeaps[1] :
size = 8554295296 (0x1fde03000) (7.97 GiB)
flags:
None
memoryTypeCount = 11
memoryTypes[0] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[1] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[2] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[3] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[4] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[5] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[6] :
heapIndex = 1
propertyFlags = 0x0:
memoryTypes[7] :
heapIndex = 0
propertyFlags = 0x1:
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
memoryTypes[8] :
heapIndex = 0
propertyFlags = 0x1:
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
memoryTypes[9] :
heapIndex = 1
propertyFlags = 0x6:
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
memoryTypes[10] :
heapIndex = 1
propertyFlags = 0xe:
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
VK_MEMORY_PROPERTY_HOST_CACHED_BIT
NVIDIA: OpenCL 1.2 CUDA 10.1.131
, INTEL: OpenCL 1.2
This llvm bug [1] seems to imply unified-shared-memory is possible and already does exist for kepler (sm_30, sm_35, sm_37), and also for fermi (sm_20).
No. The bug report says that these have unified virtual addressing (UVA) which is a different thing. The hardware lacks page-faulting support, so there will not be any fine-grained automatic memory migration. While it is possible to "emulate" unified memory by migrating entire allocations (this is also how USM is implemented on Windows) as mentioned in the bug, this will be inefficient and not a solution for practical programs. There are other limitations in Kepler-era hardware, such as a more limited support for atomic operations.
On Vulkan, the situation is even worse because for a long time Vulkan only had opaque pointers, which completely breaks sharing any data structure that contains pointers between host and device. This has only changed in Vulkan 1.4. So unless these Vulkan implementations move to newer Vulkan versions, it is not a realistic target.
And there are other requirements for Vulkan SPIR-V, such as structured control flow, which additionally limits the constructs that can be expressed and can be limiting for SYCL.
Again, Vulkan shader SPIR-V is not the same thing as SPIR-V for compute environments like OpenCL or Level Zero. SPIR-V actually defines two different execution models: The shader model and the kernel model. Vulkan only supports the shader model, not the kernel model that we need.
It seems like from what you tell me, the SYCL-C++-spir-v-route v is very likely closed to me, despite being a good portable idea, and not to pin all my hopes on it.
So, I'll
Let me know, if you can think of anything else.
This feature req issue could still be of purpose to the AdaptiveCpp SYCL project, in order to ensure AdaptiveCpp SYCL spirv route works for later gen nvidia GPUs, when the nouveau/nvkm driver lands and subsequently catches up to the minimum required vulkan version.
As you say, vulkan-compute kernels are different from vulkan-shader-compute
Almost. Even compute shaders in Vulkan use the shader model to my knowledge. The SPIR-V kernel model is not supported by Vulkan, even for Vulkan compute shaders. You need OpenCL for that.
The route SYCL->SPIR-V->rusticl is much more realistic than Vulkan and could potentially be supported in near to mid-term future.
investigate other routes, Futhark-OpenCL-C route, OpenCL-C-spir-v-route [9]
There are already SYCL implementations that support OpenCL, so no need to look elsewhere if this solves your problem. As I've said we could add an OpenCL backend in the near future if there is a use case.
AMD HIP [10], but ensuring resulting binary can work with Intel or NVidia GPUs (even be it CUDA) without recompilation. Ex HIPSPV [11], cpc/hipcl [12], CHIPSPV/Chipstar [13]
HIP has exactly the same and much more problems than SYCL. As you say, creating a portable binary is not something it was built to do. HIP NVIDIA support will also go through CUDA.
starpu-runtime/starpu [14]
To my knowledge, starpu is a runtime system for automatic work distribution. It's not a compiler, so I don't see how it would solve your issue.
There are already SYCL implementations that support OpenCL, so no need to look elsewhere if this solves your problem.
If you're talking about IntelLLVM/DPC++, here's the (almost) current state of things: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9061. A lot of features are not supported due to Rusticl limitations, and you have to apply some hacks because IntelLLVM generates not fully standard-compliant SPIR-V, but in the end, you can get it to run some examples.
@al42and Yes, I was referring to DPC++. Thanks for the pointer! I had assumed it was farther than that. USM problems are a killer for us, because we are heavily relying on it (even for buffers, which uses USM device allocations. Shared allocations are not so critical).
(And I always get very sad when people assume that SYCL == DPC++ as in this post :/ )
Describe the motivation for the feature request AdaptiveCpp SYCL should be able to adaptively compile a feature subset of SYCL CPP to any given vulkan version level. It is desirable to have a graceful reduction of cpp features, by vulkan version level. One could also introduce less performant emulation of missing features, like shared memory. Thus AdaptiveCpp will be able to target lower vulkan versions such as 1.1 and 1.0 .
This will allow AdaptiveCpp SYCL to be used on a much broader range of hardware that may not have all the most recent tech-specs.
I ask this because
AFAICT, the following two merges will happen in near future for linux-kernel and mesa-project respectively.
https://lore.kernel.org/dri-devel/20230720001443.2380-1-dakr@redhat.com/T/#m88d67c231b63b7340b22dc2a4d1102eed840df00
I could be wrong.
I read
https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/compilation.md
In it, it seems like the below path is documented as only for intel-GPU
Is it that the SPIRV pathway is specified only for Intel, as intel is a new comer into the GPU space without custom middleware?
and that it SPIR-V is the base-case for any GPU without any special AMD/NVIDIA middleware?
Am I correct in predicting that it should work for any accelerator that provides a vulkan/SPIRV target?
Or is it the case that code/features needs to be added to make this work in AdaptiveCpp?
Describe the solution you'd like make open-sycl work on
This way CUDA drivers don't need to be present on either OS., and one does not have to code in CUDA-nvcc-lang, and resulting binary would work on GPU from any vendor.
If applicable, describe alternatives you've considered NA
Additional context On win10, NVIDIA has declared End of support, with last nvidia driver version being 425.31. So the kepler mobile GT-740m windows final vulkan version is 1.1.97, as per vulkaninfo cmd output. On linux, when nvkm lands, claimed final vulkan version for the kepler-era cards may also be limited to 1.2, though later GPUs will have higher versions supported [1]
Ref
https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/92#note_1545458
https://www.phoronix.com/news/NVK-Merge-Request-Mesa
Please let know what you think, and fill gaps in my understanding.