EnzymeAD / Enzyme

High-performance automatic differentiation of LLVM and MLIR.
https://enzyme.mit.edu
Other
1.25k stars 104 forks source link

CUDA: Zero shmem_shadow from multiple lane #876

Open vchuravy opened 1 year ago

vchuravy commented 1 year ago

When we zero out the shadow for the shmem, we currently do it on each lane, we should split the work across multiple threads.

wsmoses commented 1 year ago

Seems like more of an issue for https://github.com/EnzymeAD/Enzyme but sure