KhronosGroup / OpenCL-Docs

OpenCL API, OpenCL C, Extensions, SPIR-V Environment Specs, Ref page, and C++ for OpenCL doc sources.
Other
351 stars 110 forks source link

About 2D2D async_copy on RGB images and other stencil codes #573

Open hominhquan opened 3 years ago

hominhquan commented 3 years ago

Imagine we read an RGB image with three uchar per pixel (uchar[3]). All pixels of image are written (packed) into a cl_mem buffer. Then we want to use async_work_group_copy_2D2D() to optimize memory transfer between __global and __local.

The point is:

I'm wondering if we can improve the new async DMA spec for more ease of coding/optimizing for these scientific stencil applications ? like we don't bother calculating the begin address of each sub-image but always give in the original buffer pointer and position index (i, j) of the sub-block to be copied - the developer reasons in term of pixel, not byte or gentype. Then the async API, by taking an extra num_gentype_per_pixel for example, manages to jump to the correct address and copy the right amount of data underlying.

Below is a generic 2D2D copy and its necessary parameters in my mind:

copy_2D2D

[1] https://www.researchgate.net/profile/Muhammad-Abdul-Basit/publication/287166894_Lattice_Boltzmann_method_and_its_applications_to_fluid_flow_problems/links/5c3699c892851c22a368bf94/Lattice-Boltzmann-method-and-its-applications-to-fluid-flow-problems.pdf [2] https://www.sciencedirect.com/science/article/pii/S0898122111001064

alycm commented 3 years ago

Thank you for your feedback!

This is being discussed on internal issue 32.