Closed mikepapadim closed 2 weeks ago
What is the pass-rate for this PR?
I think the scheduler fix also need to be applied for the PTX backend.
not sure, if relevant. Should these tests also pass with the fix?
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True" -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
not sure, if relevant. Should these tests also pass with the fix?
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True" -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
these are failing in develop as well. Now they are passing for me.
not sure, if relevant. Should these tests also pass with the fix?
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True" -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
these are failing in develop as well, not related to the scheduler
I noticed that they fail in the same call, but with different error code, and thought to ask.
[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5
What is the pass-rate for this PR?
{'[PASS]': 643, '[FAILED]': 7, '[UNSUPPORTED]': 56} Coverage [PASS/(PASS+FAIL)]: 98.92% Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 91.08%
from which 2 are the virtual dev
550.67
With driver 550.67
there is currently one test failing:
tornado -ea --threadInfo --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True " -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes#testMatrix09"
WARNING: Using incubator modules: jdk.incubator.vector
Task info: smmmmmm.t0
Backend : OPENCL
Device : Quadro GP100 CL_DEVICE_TYPE_GPU (available)
Dims : 2
Global work offset: [0, 0]
Global work size : [256, 256]
Local work size : [32, 32, 1]
Number of workgroups : [8, 8]
[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on Quadro GP100 (Device 0).
Test: class uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes#testMatrix09
Running test: testMatrix09 ................ [FAILED]
\_[REASON] expected:<254.85081> but was:<0.0>
I suggest to patch it, by using the GridScheduler with 16, 16, 1
as local work size. What do you think? The alternative would be to change the OCLGenericGPUScheduler to use 16, 16, 1
as default local work size. @mikepapadim @jjfumero any comments?
tornado -ea --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True " -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.loops.TestParallelDimensions"
I provided a patch that automatically adapts the local grid dimensions when we use the NVIDIA scheduler (after 550.67
dimension). The problem was that the local size was 16*16*8=2048
which is greater than the supported max size which is: 1024
.
You can test with:
tornado -ea --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True " -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
Awesome, all tests are passing with OpenCL (RTX 3070 and NVIDIA 550.76)
tornado -ea --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True " -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes"
WARNING: Using incubator modules: jdk.incubator.vector
Test: class uk.ac.manchester.tornado.unittests.matrices.TestMatrixTypes
Running test: testMatrixRowInt01 ................ [PASS]
Running test: testMatrixRowInt02 ................ [PASS]
Running test: testMatrixRowFloat01 ................ [PASS]
Running test: testMatrixRowFloat02 ................ [PASS]
Running test: testMatrixRowFloat4 ................ [PASS]
Running test: testMatrixRowDouble01 ................ [PASS]
Running test: testMatrixRowDouble02 ................ [PASS]
Running test: testMatrix00 ................ [PASS]
Running test: testMatrix01 ................ [PASS]
Running test: testMatrix02 ................ [PASS]
Running test: testMatrix03 ................ [PASS]
Running test: testMatrix04 ................ [PASS]
Running test: testMatrix05 ................ [PASS]
Running test: testMatrix06 ................ [PASS]
Running test: testMatrix07 ................ [PASS]
Running test: testMatrix08 ................ [PASS]
Running test: testMatrix09 ................ [PASS]
Running test: testMatrix10 ................ [PASS]
Running test: testMatrix11 ................ [PASS]
Running test: testMatrix12 ................ [PASS]
Running test: testMatrix13 ................ [PASS]
Running test: testMatrix14 ................ [PASS]
Running test: testMatrix15 ................ [PASS]
Running test: testMatrix16 ................ [PASS]
Running test: testMatrix17 ................ [PASS]
Running test: testMatrix18 ................ [PASS]
Running test: testMatrix19 ................ [PASS]
Running test: testMatrix20 ................ [PASS]
Running test: testMatrix21 ................ [PASS]
Running test: testMatrix22 ................ [PASS]
Running test: testMatrix23 ................ [PASS]
Test ran: 31, Failed: 0, Unsupported: 0
For sanity check, it would be good to test the new patch also for OSx and Windows.
For sanity check, it would be good to test the new patch also for OSx and Windows.
yes, I tested on OSx. But it does not have an NVIDIA GPU so it is not effective. Nonetheless, this PR does not break the default scheduler.
Comments have been addressed. I also tested on Windows 11 with NVIDIA GPU driver 551.61
.
Problem description
It seems that the new thread deployment strategy was introduced after the nvidia driver 550.67. So, not checking also for the minor version causes tornado-runtime to use the speciliazed nvidia scheduler and fails with:
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?