alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
358 stars 74 forks source link

Texture/image support #1253

Open bernhardmgruber opened 3 years ago

bernhardmgruber commented 3 years ago

Alpaka currently lacks support for texture/image capabilities of certain backends. This currently concerns the CUDA backend and the currently developed SYCL backend. Texture/image support was also requested in: https://github.com/alpaka-group/alpaka/issues/1065 The discussion also came up during the prototyping of kernel side accessors to buffers: https://github.com/alpaka-group/alpaka/issues/38 and https://github.com/alpaka-group/alpaka/pull/1249

Since backend support for this feature is scarce, we have two options to implement such a facility:

  1. emulation on backends without texture/image support, e.g. via a wrapper on alpaka::Buf
  2. do not provide the feature and fail to compile

While option 1 is certainly doable, given that only CUDA supports this feature, we might run into a situation where the feature performs suboptimally on non-CUDA backends, because we might not pick the right emulation approach for everyone. E.g. is Z-order storage really the best memory layout? How about weird texture formats (see: https://sycl.readthedocs.io/en/latest/iface/image.html#sycl-image-channel-order)? Bilinear/trilinear interpolation on access? Edge behavior? Normalized texture coordinates? There is a lot we could get wrong or at least bad.

Option 2 is safe from our perspective, but locks users into CUDA (and later SYCL) when they use the feature. So as it stands now they could just use CUDA directly.

We could also mix the options and just provide a very limited texture/image support that we are confident we can emulate.

What is the strategy to go forward wrt. texture/image support?

bernhardmgruber commented 3 years ago

So while HIP did not mention texture support in their documentation, the functionality seems to be there: https://github.com/ROCm-Developer-Tools/HIP/blob/main/include/hip/hcc_detail/texture_functions.h

sbastrakov commented 3 years ago

I agree with your assessment. I do not think textures are that widely used in computational applications nowadays, as there are now for a long time caches on GPUs (was one of the reasons to use textures for computations in early CUDA days), and their operations like interpolation have limited accuracy. However emulating while I think not that difficult to do to make it just work, without performance requirements, would still require continuous maintenance.

bernhardmgruber commented 3 years ago

We opened a GSoC position for this feature: https://www.casus.science/news-events/events/google-summer-of-code-2021/#anchor-6

bussmann commented 3 years ago

Dear all, I firmly believe this is a side quest. I think there is more important stuff to do.

PrometheusPi commented 3 years ago

While this might be a less important task for the overall goal of alpaka, ISAAC would definitely benefit from that.

psychocoderHPC commented 3 years ago

While this might be a less important task for the overall goal of alpaka, ISAAC would definitely benefit from that.

To give it a little bit more context: In ISAAC we can have the case that we visualize multiple data sources with different resolutions within the same kernel. Accessing the data in a texture-like way with normalized indices and automatic interpolation is simplifying the ray casting kernel.

Maybe we can propagate work at some point from ISAAC back into alpaka.

FelixTUD commented 3 years ago

ISAAC would greatly benefit from textures. The addressing is not really a problem, as it can easily be emulated with minimal overhead. Bigger problems, which can be solved with a proper native texture support are:

  1. Caching: currently the data for 3D buffers is in a normal array and as such is cached normally along the array, resulting in many cache misses, as the accesses are most frequently on neighbouring voxels which are most likely at least in 2 of the 3 dimensions far from another in memory and therefore not cached, textures would solve this, as they cache locally in the dimension of the buffer
  2. Interpolation: currently the trilinear interpolation is emulated with 8 buffer reads on neighbouring voxels, which are most likely not cached due to problem 1. and therefore have a very high performance cost, with texture support the interpolation would be done automatically on access and much cheaper
  3. Buffer boundary handling: currently all reads of the 3D buffers need to be boundary checked on every read and different functionalities are emulated if a boundary is reached like texture repeat, clamp and constant color, which would be done much more efficiently with a native texture implementation
bussmann commented 3 years ago

How long would a texture imp in Alpaka take? Can we test the perf gain by trying it in a CUDA only branch for ISAAC?

FelixTUD commented 3 years ago

Right now I'm trying to integrate the native cuda textures in ISAAC, that I can hopefully include some performance numbers in my master thesis. And as @psychocoderHPC said, maybe we can propagate some of the work to alpaka, as I need to implement a software emulation for all non cuda capable architectures anyway

bernhardmgruber commented 3 years ago

Here is how I envisioned the design of an image accessor:

    using Image = cudaTextureObject_t; // we likely need an Image type

    template<typename TElem, typename TBufferIdx, typename TAccessModes>
    struct Accessor<Image, TElem, TBufferIdx, 2, TAccessModes> {
        // Vec subscript to be compatible with buffer accessor
        ALPAKA_FN_HOST_ACC auto operator[](Vec<DimInt<2>, TBufferIdx> i) const -> TElem { 
            return (*this)(i[0], i[1]);
        }

        // integral call operator to be compatible with buffer accessor, does texel fetch
        ALPAKA_FN_HOST_ACC auto operator()(TBufferIdx y, TBufferIdx x) const -> TElem { 
            return tex1Dfetch<TElem>(texObj, y * rowPitchInValues + x);
        }

        // floating-point call operator for interpolated access
        ALPAKA_FN_HOST_ACC auto operator()(float y, float x) const -> TElem { 
            return tex2D<TElem>(texObj, x, y);
        }

        Image texObj;
        TBufferIdx rowPitchInValues; // for texel fetch
        Vec<DimInt<2>, TBufferIdx> extents; // compatibility with buffer accessor
    };

TAccessMode probably just allows alpaka::ReadOnly.