Guard Nvidia GPU scheduler for minor version to prevent incorrect thread scheduling

mikepapadim commented 1 month ago

Problem description

It seems that the new thread deployment strategy was introduced after the nvidia driver 550.67. So, not checking also for the minor version causes tornado-runtime to use the speciliazed nvidia scheduler and fails with:

[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -54

Mark the backends affected by this PR.

[x] OpenCL
[ ] PTX
[ ] SPIRV

OS tested

Mark the OS where this PR is tested.

[x] Linux
[x] OSx
[x] Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

[ ] Yes
[x] No

How to test the new patch?

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.loops.TestParallelDimensions"

jjfumero commented 1 month ago

What is the pass-rate for this PR?

jjfumero commented 1 month ago

I think the scheduler fix also need to be applied for the PTX backend.

stratika commented 1 month ago

not sure, if relevant. Should these tests also pass with the fix?

tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True"  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"

mikepapadim commented 1 month ago

not sure, if relevant. Should these tests also pass with the fix?

tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True"  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"

these are failing in develop as well. Now they are passing for me.

stratika commented 1 month ago

not sure, if relevant. Should these tests also pass with the fix?
tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True"  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
these are failing in develop as well, not related to the scheduler

I noticed that they fail in the same call, but with different error code, and thought to ask. [TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5

mikepapadim commented 1 month ago

What is the pass-rate for this PR?

{'[PASS]': 643, '[FAILED]': 7, '[UNSUPPORTED]': 56} Coverage [PASS/(PASS+FAIL)]: 98.92% Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 91.08%

from which 2 are the virtual dev

stratika commented 2 weeks ago

550.67

With driver 550.67 there is currently one test failing:

tornado -ea --threadInfo --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes#testMatrix09"
WARNING: Using incubator modules: jdk.incubator.vector
Task info: smmmmmm.t0
    Backend           : OPENCL
    Device            : Quadro GP100 CL_DEVICE_TYPE_GPU (available)
    Dims              : 2
    Global work offset: [0, 0]
    Global work size  : [256, 256]
    Local  work size  : [32, 32, 1]
    Number of workgroups  : [8, 8]

[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on Quadro GP100 (Device 0).

Test: class uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes#testMatrix09
    Running test: testMatrix09               ................  [FAILED] 
        \_[REASON] expected:<254.85081> but was:<0.0>

I suggest to patch it, by using the GridScheduler with 16, 16, 1 as local work size. What do you think? The alternative would be to change the OCLGenericGPUScheduler to use 16, 16, 1 as default local work size. @mikepapadim @jjfumero any comments?

stratika commented 2 weeks ago

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.loops.TestParallelDimensions"

I provided a patch that automatically adapts the local grid dimensions when we use the NVIDIA scheduler (after 550.67 dimension). The problem was that the local size was 16*16*8=2048 which is greater than the supported max size which is: 1024.

You can test with:

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"

jjfumero commented 2 weeks ago

Awesome, all tests are passing with OpenCL (RTX 3070 and NVIDIA 550.76)

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
WARNING: Using incubator modules: jdk.incubator.vector
Test: class uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes
    Running test: testMatrixRowInt01         ................  [PASS] 
    Running test: testMatrixRowInt02         ................  [PASS] 
    Running test: testMatrixRowFloat01       ................  [PASS] 
    Running test: testMatrixRowFloat02       ................  [PASS] 
    Running test: testMatrixRowFloat4        ................  [PASS] 
    Running test: testMatrixRowDouble01      ................  [PASS] 
    Running test: testMatrixRowDouble02      ................  [PASS] 
    Running test: testMatrix00               ................  [PASS] 
    Running test: testMatrix01               ................  [PASS] 
    Running test: testMatrix02               ................  [PASS] 
    Running test: testMatrix03               ................  [PASS] 
    Running test: testMatrix04               ................  [PASS] 
    Running test: testMatrix05               ................  [PASS] 
    Running test: testMatrix06               ................  [PASS] 
    Running test: testMatrix07               ................  [PASS] 
    Running test: testMatrix08               ................  [PASS] 
    Running test: testMatrix09               ................  [PASS] 
    Running test: testMatrix10               ................  [PASS] 
    Running test: testMatrix11               ................  [PASS] 
    Running test: testMatrix12               ................  [PASS] 
    Running test: testMatrix13               ................  [PASS] 
    Running test: testMatrix14               ................  [PASS] 
    Running test: testMatrix15               ................  [PASS] 
    Running test: testMatrix16               ................  [PASS] 
    Running test: testMatrix17               ................  [PASS] 
    Running test: testMatrix18               ................  [PASS] 
    Running test: testMatrix19               ................  [PASS] 
    Running test: testMatrix20               ................  [PASS] 
    Running test: testMatrix21               ................  [PASS] 
    Running test: testMatrix22               ................  [PASS] 
    Running test: testMatrix23               ................  [PASS] 
Test ran: 31, Failed: 0, Unsupported: 0

jjfumero commented 2 weeks ago

For sanity check, it would be good to test the new patch also for OSx and Windows.

stratika commented 2 weeks ago

For sanity check, it would be good to test the new patch also for OSx and Windows.

yes, I tested on OSx. But it does not have an NVIDIA GPU so it is not effective. Nonetheless, this PR does not break the default scheduler.

stratika commented 2 weeks ago

Comments have been addressed. I also tested on Windows 11 with NVIDIA GPU driver 551.61.

beehive-lab / TornadoVM