KhronosGroup / OpenCL-TTL

Tensor Tiling Library

Apache License 2.0

33 stars 4 forks source link

Sorry for the slow response; been on vacation.

We only use builtin async_work_group_copy(3D3D) when compiling for OpenCL; in the C version, a hand-written async_work_group_copy is used, and in fact for OpenCL builds that do not support async_work_group_copy we do similar.

See defining TTL_COPY_3D

Are you talking about not using async_work_group_copy in the OpenCL environment, if so, then I guess we need to provide some way of redirecting.

Maybe

ifndef HostLocalTransfer

define HostLocalTransfer async_work_group_copy3D3D

endif

Something like this?

On the second question, what sort of optimizations? We want to keep it as something that supports a broad church, but obviously, anything that helps we would be happy to try and add.

KhronosGroup / OpenCL-TTL

Whether it can support the mobile GPU well? #5

ifndef HostLocalTransfer

define HostLocalTransfer async_work_group_copy3D3D

endif