Open rafaqz opened 10 months ago
It seems DynamicCheck is only half the problem - it helps a lot removing it, but something else is also blocking the compiler constant propagating size information (its like a sized array) from the type through the KernelAbstractions kernel that it can see in the single threaded version.
I will have to fix it to find out what the problem is, so will probably submit a PR sometime.
Given that
DynamicCheck()
breaks SIMD this can be an order of magnitude faster for some inexpensive tasks.I'll write up a better MWE, but this is the scale of it - a single threaded game of life in DynamicGrids.jl (basically summing a 3x3 window over Bool) is 2x faster than an 8 core KernelAbstractions.jl sim pretty much just from
DynamicCheck()
: