beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.15k stars 109 forks source link

Guard Nvidia GPU scheduler for minor version to prevent incorrect thread scheduling #446

Closed mikepapadim closed 2 weeks ago

mikepapadim commented 1 month ago

Problem description

It seems that the new thread deployment strategy was introduced after the nvidia driver 550.67. So, not checking also for the minor version causes tornado-runtime to use the speciliazed nvidia scheduler and fails with:

[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -54

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

How to test the new patch?

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.loops.TestParallelDimensions"

jjfumero commented 1 month ago

What is the pass-rate for this PR?

jjfumero commented 1 month ago

I think the scheduler fix also need to be applied for the PTX backend.

stratika commented 1 month ago

not sure, if relevant. Should these tests also pass with the fix?

tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True"  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
mikepapadim commented 1 month ago

not sure, if relevant. Should these tests also pass with the fix?

tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True"  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"

these are failing in develop as well. Now they are passing for me.

stratika commented 1 month ago

not sure, if relevant. Should these tests also pass with the fix?

tornado  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True"  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"

these are failing in develop as well, not related to the scheduler

I noticed that they fail in the same call, but with different error code, and thought to ask. [TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5

mikepapadim commented 1 month ago

What is the pass-rate for this PR?

{'[PASS]': 643, '[FAILED]': 7, '[UNSUPPORTED]': 56} Coverage [PASS/(PASS+FAIL)]: 98.92% Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 91.08%

from which 2 are the virtual dev

stratika commented 2 weeks ago

550.67

With driver 550.67 there is currently one test failing:

tornado -ea --threadInfo --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes#testMatrix09"
WARNING: Using incubator modules: jdk.incubator.vector
Task info: smmmmmm.t0
    Backend           : OPENCL
    Device            : Quadro GP100 CL_DEVICE_TYPE_GPU (available)
    Dims              : 2
    Global work offset: [0, 0]
    Global work size  : [256, 256]
    Local  work size  : [32, 32, 1]
    Number of workgroups  : [8, 8]

[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on Quadro GP100 (Device 0).

Test: class uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes#testMatrix09
    Running test: testMatrix09               ................  [FAILED] 
        \_[REASON] expected:<254.85081> but was:<0.0>

I suggest to patch it, by using the GridScheduler with 16, 16, 1 as local work size. What do you think? The alternative would be to change the OCLGenericGPUScheduler to use 16, 16, 1 as default local work size. @mikepapadim @jjfumero any comments?

stratika commented 2 weeks ago
tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.loops.TestParallelDimensions"

I provided a patch that automatically adapts the local grid dimensions when we use the NVIDIA scheduler (after 550.67 dimension). The problem was that the local size was 16*16*8=2048 which is greater than the supported max size which is: 1024.

You can test with:

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
jjfumero commented 2 weeks ago

Awesome, all tests are passing with OpenCL (RTX 3070 and NVIDIA 550.76)

tornado -ea  --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
WARNING: Using incubator modules: jdk.incubator.vector
Test: class uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes
    Running test: testMatrixRowInt01         ................  [PASS] 
    Running test: testMatrixRowInt02         ................  [PASS] 
    Running test: testMatrixRowFloat01       ................  [PASS] 
    Running test: testMatrixRowFloat02       ................  [PASS] 
    Running test: testMatrixRowFloat4        ................  [PASS] 
    Running test: testMatrixRowDouble01      ................  [PASS] 
    Running test: testMatrixRowDouble02      ................  [PASS] 
    Running test: testMatrix00               ................  [PASS] 
    Running test: testMatrix01               ................  [PASS] 
    Running test: testMatrix02               ................  [PASS] 
    Running test: testMatrix03               ................  [PASS] 
    Running test: testMatrix04               ................  [PASS] 
    Running test: testMatrix05               ................  [PASS] 
    Running test: testMatrix06               ................  [PASS] 
    Running test: testMatrix07               ................  [PASS] 
    Running test: testMatrix08               ................  [PASS] 
    Running test: testMatrix09               ................  [PASS] 
    Running test: testMatrix10               ................  [PASS] 
    Running test: testMatrix11               ................  [PASS] 
    Running test: testMatrix12               ................  [PASS] 
    Running test: testMatrix13               ................  [PASS] 
    Running test: testMatrix14               ................  [PASS] 
    Running test: testMatrix15               ................  [PASS] 
    Running test: testMatrix16               ................  [PASS] 
    Running test: testMatrix17               ................  [PASS] 
    Running test: testMatrix18               ................  [PASS] 
    Running test: testMatrix19               ................  [PASS] 
    Running test: testMatrix20               ................  [PASS] 
    Running test: testMatrix21               ................  [PASS] 
    Running test: testMatrix22               ................  [PASS] 
    Running test: testMatrix23               ................  [PASS] 
Test ran: 31, Failed: 0, Unsupported: 0
jjfumero commented 2 weeks ago

For sanity check, it would be good to test the new patch also for OSx and Windows.

stratika commented 2 weeks ago

For sanity check, it would be good to test the new patch also for OSx and Windows.

yes, I tested on OSx. But it does not have an NVIDIA GPU so it is not effective. Nonetheless, this PR does not break the default scheduler.

stratika commented 2 weeks ago

Comments have been addressed. I also tested on Windows 11 with NVIDIA GPU driver 551.61.