Closed fwyzard closed 3 months ago
let's see if the 23rd time's a charm...
I check your code locally. The problem is the AccCpuThreads
backend. On our workstation with 32 Core Epyc and Clang 17 debug build, the test already runs 100s. I think the annotation of the thread sanitizer also add a big overhead and the CI runners are weaker than our workstation.
I suggest to disable the tests with std::thread
backend or if it make sense to test the thread backend, decrease the problem size.
I suggest to disable the tests with
std::thread
backend or if it make sense to test the thread backend, decrease the problem size.
I agree... I've tried reducing the block size in all tests to 32 threads (or elements) per block.
Closed in favour of https://github.com/alpaka-group/alpaka/pull/2369 that includes further fixes and clean up.
Implement device functions to simplify writing kernel code:
oncePerGrid(acc)
;oncePerBlock(acc)
;uniformElements(acc, size)
anduniformElements(acc, begin, end)
;uniformGroups(acc, size)
anduniformGroupElements(acc, group, size)
;independentGroups(acc, groups)
andindependentGroupElements(acc, group_size)
;Implement tests for the most common functions.