SPIR-V Support for the OpenCL Runtime in TornadoVM

Description

Finally I got a bit of time to work on this. This PR complements the OpenCL runtime in TornadoVM to dispatch SPIR-V kernels. The code generator generates SPIR-V, and then the code is dispatched through the OpenCL runtime.

Note that we used to dispatch the SPIR-V code only with the Intel Level Zero Runtime. Now, we can choose which one to use. By default is set to OpenCL, but we can change this.

Note that the SPIR-V backend now requires the OpenCL port, because it tries to reuse as much as possible of the existing OpenCL runtime.

Problem description

n/. a.

Backend/s tested

Mark the backends affected by this PR.

[X] OpenCL
[ ] PTX
[X] SPIRV

OS tested

Mark the OS where this PR is tested.

[X] Linux
[ ] OSx
[ ] Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

[ ] Yes
[X] No

How to test the new patch?

$ make BACKEND=spirv
$ tornado --devices

Number of Tornado drivers: 2
Driver: SPIR-V
  Total number of SPIR-V devices  : 2
  Tornado device=0:0  (DEFAULT)
    SPIRV -- SPIRV OCL - Intel(R) UHD Graphics 770      <<<< NEW RUNTIME FOR THE SPIR-V BACKEND
        Global Memory Size: 24.9 GB
        Local Memory Size: 64.0 KB
        Workgroup Dimensions: 3
        Total Number of Block Threads: [512]
        Max WorkGroup Configuration: [512, 512, 512]
        Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:1
    SPIRV -- SPIRV LevelZero - Intel(R) UHD Graphics 770
        Global Memory Size: 24.9 GB
        Local Memory Size: 64.0 KB
        Workgroup Dimensions: 3
        Total Number of Block Threads: [512]
        Max WorkGroup Configuration: [512, 512, 512]
        Device OpenCL C version:  (LEVEL ZERO) 1.3

Driver: OpenCL
  Total number of OpenCL devices  : 3
  Tornado device=1:0
    OPENCL --  [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
        Global Memory Size: 7.8 GB
        Local Memory Size: 48.0 KB
        Workgroup Dimensions: 3
        Total Number of Block Threads: [1024]
        Max WorkGroup Configuration: [1024, 1024, 64]
        Device OpenCL C version: OpenCL C 1.2

  Tornado device=1:1
    OPENCL --  [Intel(R) OpenCL HD Graphics] -- Intel(R) UHD Graphics 770
        Global Memory Size: 24.9 GB
        Local Memory Size: 64.0 KB
        Workgroup Dimensions: 3
        Total Number of Block Threads: [512]
        Max WorkGroup Configuration: [512, 512, 512]
        Device OpenCL C version: OpenCL C 1.2

  Tornado device=1:2
    OPENCL --  [Portable Computing Language] -- cpu-haswell-12th Gen Intel(R) Core(TM) i7-12700K
        Global Memory Size: 29.1 GB
        Local Memory Size: 1.3 MB
        Workgroup Dimensions: 3
        Total Number of Block Threads: [4096]
        Max WorkGroup Configuration: [4096, 4096, 4096]
        Device OpenCL C version: OpenCL C 1.2 PoCL

$ make tests

==================================================
              Unit tests report 
==================================================

{'[PASS]': 551, '[FAILED]': 12, '[UNSUPPORTED]': 133}
Coverage [PASS/(PASS+FAIL)]: 97.87%
Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 79.17%

==================================================

Total Time(s): 218.74939393997192

Same pass-rate as the Level Zero port.

beehive-lab / TornadoVM