Open baryluk opened 9 months ago
That is expected behaviour. The tuner simply runs a specific kernel, and certain kernels have certain constraints, also dependent on the tuner parameters. That's why those cases are skipped.
Furthermore, it is probably not a good idea to tune for these tiny input size, because the main you'll measure is kernel launch time overhead and similar things. Probably best to start at 64x64 or even higher.
CLBlast-1.6.2-linux-x86_64
Example of sizes that do fail: