beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.16k stars 109 forks source link

NVIDIA 2D Thread Scheduler Fixed #421

Closed jjfumero closed 2 months ago

jjfumero commented 2 months ago

Description

This patch provides a new Thread Scheduler for NVIDIA GPUs.

Problem description

The problem is that, when using the latest NVIDIA Drivers (e.g., 550.76), the thread block is set to 32x32 for 2D kernels. This block size seems to be illegal only when using the latest NVIDIA drivers. This patch provides a custom NVIDIA scheduler to fix this. Performance over the default scheduler increases ~300GFLOPs on my RTX 3070 GPU for the canonical matrix multiplications with this patch.

Backend/s tested

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

How to test the new patch?

make BACKEND=opencl
tornado --threadInfo --jvm="-Ds0.t0.device=0:0" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D 512
tornado --threadInfo --jvm="-Ds0.t0.device=0:0" -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D 1024
jjfumero commented 2 months ago

Related issue: https://github.com/beehive-lab/TornadoVM/pull/356

stratika commented 2 months ago

I guess for older driver versions, we will not see any difference, right? I tried it with 525.147.05.