alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
358 stars 74 forks source link

Implement device functions to simplify writing kernel code #2337

Closed fwyzard closed 3 months ago

fwyzard commented 3 months ago

Implement device functions to simplify writing kernel code:

Implement tests for the most common functions.

fwyzard commented 3 months ago

let's see if the 23rd time's a charm...

SimeonEhrig commented 3 months ago

I check your code locally. The problem is the AccCpuThreads backend. On our workstation with 32 Core Epyc and Clang 17 debug build, the test already runs 100s. I think the annotation of the thread sanitizer also add a big overhead and the CI runners are weaker than our workstation.

I suggest to disable the tests with std::thread backend or if it make sense to test the thread backend, decrease the problem size.

fwyzard commented 3 months ago

I suggest to disable the tests with std::thread backend or if it make sense to test the thread backend, decrease the problem size.

I agree... I've tried reducing the block size in all tests to 32 threads (or elements) per block.

fwyzard commented 3 months ago

Closed in favour of https://github.com/alpaka-group/alpaka/pull/2369 that includes further fixes and clean up.