beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.17k stars 110 forks source link

Support vectors of float-16 values #372

Closed mairooni closed 4 months ago

mairooni commented 4 months ago

Description

This PR provides support for vectors containing half-float values.

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

How to test the new patch?

Run tornado-test -V uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats

jjfumero commented 4 months ago

Some testing:

a) OpenCL on the Intel HD Graphics:

tornado-test --threadInfo -V --jvm="-Dtornado.unittests.device=0:1" uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats 
WARNING: Using incubator modules: jdk.incubator.vector

Task info: s0.t0
    Backend           : OPENCL
    Device            : Intel(R) UHD Graphics 770 CL_DEVICE_TYPE_GPU (available)
    Dims              : 1
    Global work offset: [0]
    Global work size  : [16]
    Local  work size  : [16, 1, 1]
    Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
    Running test: vectorPhiTest              ................  [PASS] 
    Running test: testSimpleDotProductHalf2  ................  [PASS] 
    Running test: testSimpleDotProductHalf3  ................  [PASS] 
    Running test: testSimpleDotProductHalf4  ................  [PASS] 
    Running test: testSimpleDotProductHalf8  ................  [PASS] 
    Running test: testSimpleDotProductHalf16 ................  [PASS] 
    Running test: testSimpleVectorAddition   ................  [PASS] 
    Running test: testVectorHalf2            ................  [PASS] 
    Running test: testVectorHalf3            ................  [PASS] 
    Running test: testVectorFloat3toString   ................  [PASS] 
    Running test: testVectorHalf4            ................  [PASS] 
    Running test: testVectorHalf16           ................  [PASS] 
    Running test: testVectorHalf8            ................  [PASS] 
    Running test: testVectorHalf8_Storage    ................  [PASS] 
    Running test: testDotProduct             ................  [PASS] 
    Running test: privateVectorHalf2         ................  [PASS] 
    Running test: privateVectorHalf4         ................  [PASS] 
    Running test: privateVectorHalf8         ................  [PASS] 
    Running test: testVectorHalf4_Unary      ................  [PASS] 
    Running test: testInternalSetMethod01    ................  [PASS] 
    Running test: testInternalSetMethod02    ................  [PASS] 
    Running test: testInternalSetMethod03    ................  [PASS] 
    Running test: testInternalSetMethod04    ................  [PASS] 
    Running test: testAllocationIssue        ................  [PASS] 

B) SPIR-V Backend:

Task info: s0.t0
    Backend           : SPIRV
    Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
    Dims              : 1
    Global work offset: [0]
    Global work size  : [16]
    Local  work size  : [16, 1, 1]
    Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
    Running test: vectorPhiTest              ................  [FAILED] 
        \_[REASON] expected:<8.0> but was:<1.0>
    Running test: testSimpleDotProductHalf2  ................  [PASS] 
    Running test: testSimpleDotProductHalf3  ................  [PASS] 
    Running test: testSimpleDotProductHalf4  ................  [PASS] 
    Running test: testSimpleDotProductHalf8  ................  [PASS] 
    Running test: testSimpleDotProductHalf16 ................  [PASS] 
    Running test: testSimpleVectorAddition   ................  [FAILED] 
        \_[REASON] expected:<4.0> but was:<1.0>
    Running test: testVectorHalf2            ................  [FAILED] 
        \_[REASON] expected:<16.0> but was:<1.0>
    Running test: testVectorHalf3            ................  [FAILED] 
        \_[REASON] expected:<8.0> but was:<1.0>
    Running test: testVectorFloat3toString   ................  [PASS] 
    Running test: testVectorHalf4            ................  [FAILED] 
        \_[REASON] expected:<8.0> but was:<1.0>
    Running test: testVectorHalf16           ................  [FAILED] 
        \_[REASON] expected:<16.0> but was:<1.0>
    Running test: testVectorHalf8            ................  [FAILED] 
        \_[REASON] expected:<8.0> but was:<1.0>
    Running test: testVectorHalf8_Storage    ................  [PASS] 
    Running test: testDotProduct             ................  [PASS] 
    Running test: privateVectorHalf2         ................  [FAILED] 
        \_[REASON] expected:<120.0> but was:<1.0>
    Running test: privateVectorHalf4         ................  [FAILED] 
        \_[REASON] expected:<120.0> but was:<1.0>
    Running test: privateVectorHalf8         ................  [FAILED] 
        \_[REASON] expected:<120.0> but was:<1.0>
    Running test: testVectorHalf4_Unary      ................  [PASS] 
    Running test: testInternalSetMethod01    ................  [PASS] 
    Running test: testInternalSetMethod02    ................  [PASS] 
    Running test: testInternalSetMethod03    ................  [PASS] 
    Running test: testInternalSetMethod04    ................  [PASS] 
    Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 10, Unsupported: 0

C) For the PTX backend:

ornado-test --threadInfo -V --jvm="-Dtornado.unittests.device=0:1" uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats 
WARNING: Using incubator modules: jdk.incubator.vector

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
    Running test: vectorPhiTest              ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testSimpleDotProductHalf2  ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testSimpleDotProductHalf3  ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testSimpleDotProductHalf4  ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testSimpleDotProductHalf8  ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testSimpleDotProductHalf16 ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testSimpleVectorAddition   ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf2            ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf3            ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorFloat3toString   ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf4            ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf16           ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf8            ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf8_Storage    ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testDotProduct             ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: privateVectorHalf2         ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: privateVectorHalf4         ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: privateVectorHalf8         ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testVectorHalf4_Unary      ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testInternalSetMethod01    ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testInternalSetMethod02    ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testInternalSetMethod03    ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testInternalSetMethod04    ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
    Running test: testAllocationIssue        ................  [FAILED] 
        \_[REASON] Index 1 out of bounds for length 1
Test ran: 24, Failed: 24, Unsupported: 0

Commit point: #24c971a95

jjfumero commented 4 months ago

Let's work on it together. We can start with the SPIR-V Backend.

mairooni commented 4 months ago

I cannot reproduce these errors for some reason. These are the tests for the SPIV backend for me:

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [2]
        Local  work size  : [2, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0-MAP
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t1-REDUCE
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
        Running test: vectorPhiTest              ................  [PASS] 
        Running test: testSimpleDotProductHalf2  ................  [PASS] 
        Running test: testSimpleDotProductHalf3  ................  [PASS] 
        Running test: testSimpleDotProductHalf4  ................  [PASS] 
        Running test: testSimpleDotProductHalf8  ................  [PASS] 
        Running test: testSimpleDotProductHalf16 ................  [PASS] 
        Running test: testSimpleVectorAddition   ................  [PASS] 
        Running test: testVectorHalf2            ................  [PASS] 
        Running test: testVectorHalf3            ................  [PASS] 
        Running test: testVectorFloat3toString   ................  [PASS] 
        Running test: testVectorHalf4            ................  [PASS] 
        Running test: testVectorHalf16           ................  [PASS] 
        Running test: testVectorHalf8            ................  [PASS] 
        Running test: testVectorHalf8_Storage    ................  [PASS] 
        Running test: testDotProduct             ................  [PASS] 
        Running test: privateVectorHalf2         ................  [PASS] 
        Running test: privateVectorHalf4         ................  [PASS] 
        Running test: privateVectorHalf8         ................  [PASS] 
        Running test: testVectorHalf4_Unary      ................  [PASS] 
        Running test: testInternalSetMethod01    ................  [PASS] 
        Running test: testInternalSetMethod02    ................  [PASS] 
        Running test: testInternalSetMethod03    ................  [PASS] 
        Running test: testInternalSetMethod04    ................  [PASS] 
        Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0
mairooni commented 4 months ago

For PTX

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [2]
        Blocks dimensions : [2, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0-MAP
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t1-REDUCE
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
        Running test: vectorPhiTest              ................  [PASS] 
        Running test: testSimpleDotProductHalf2  ................  [PASS] 
        Running test: testSimpleDotProductHalf3  ................  [PASS] 
        Running test: testSimpleDotProductHalf4  ................  [PASS] 
        Running test: testSimpleDotProductHalf8  ................  [PASS] 
        Running test: testSimpleDotProductHalf16 ................  [PASS] 
        Running test: testSimpleVectorAddition   ................  [PASS] 
        Running test: testVectorHalf2            ................  [PASS] 
        Running test: testVectorHalf3            ................  [PASS] 
        Running test: testVectorFloat3toString   ................  [PASS] 
        Running test: testVectorHalf4            ................  [PASS] 
        Running test: testVectorHalf16           ................  [PASS] 
        Running test: testVectorHalf8            ................  [PASS] 
        Running test: testVectorHalf8_Storage    ................  [PASS] 
        Running test: testDotProduct             ................  [PASS] 
        Running test: privateVectorHalf2         ................  [PASS] 
        Running test: privateVectorHalf4         ................  [PASS] 
        Running test: privateVectorHalf8         ................  [PASS] 
        Running test: testVectorHalf4_Unary      ................  [PASS] 
        Running test: testInternalSetMethod01    ................  [PASS] 
        Running test: testInternalSetMethod02    ................  [PASS] 
        Running test: testInternalSetMethod03    ................  [PASS] 
        Running test: testInternalSetMethod04    ................  [PASS] 
        Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0
jjfumero commented 4 months ago

ok. let me check with an older CPU. I detected that some of the tests are not passing using > Intel 12th gen HD Graphics.

jjfumero commented 4 months ago

My mistake. The PTX tests are passing. The command I used was wrong. Let me work on the SPIR-V and see what I can spot.

jjfumero commented 4 months ago

Still with an older CPU fails. I am using Intel compute runtime 23.35.27191.9 I will try to update to a newer version and check again.

jjfumero commented 4 months ago

This did the trick for SPIR-V Half2 vectors:

diff --git a/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java b/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
index 761e060ce..01e2c8ae5 100644
--- a/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
+++ b/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
@@ -13,7 +13,7 @@
  *
  * This code is distributed in the hope that it will be useful, but WITHOUT
  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
  * version 2 for more details (a copy is included in the LICENSE file that
  * accompanied this code).
  *
@@ -95,6 +95,8 @@ public class VectorAddNode extends BinaryNode implements LIRLowerable, VectorOp

         if (kind.getElementKind().isFloatingPoint()) {
             binaryOp = SPIRVAssembler.SPIRVBinaryOp.ADD_FLOAT;
+        } else if (kind.isHalf()) {
+            binaryOp = SPIRVAssembler.SPIRVBinaryOp.ADD_FLOAT;
         }

Le'ts replicate this change for all vector types. It might be a driver fix after all with new versions of the Intel compute runtime.

jjfumero commented 4 months ago

Cool, now it passes all new tests regarding FP16 with SPIR-V:

Task info: s0.t0
    Backend           : SPIRV
    Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
    Dims              : 1
    Global work offset: [0]
    Global work size  : [16]
    Local  work size  : [16, 1, 1]
    Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
    Running test: vectorPhiTest              ................  [PASS] 
    Running test: testSimpleDotProductHalf2  ................  [PASS] 
    Running test: testSimpleDotProductHalf3  ................  [PASS] 
    Running test: testSimpleDotProductHalf4  ................  [PASS] 
    Running test: testSimpleDotProductHalf8  ................  [PASS] 
    Running test: testSimpleDotProductHalf16 ................  [PASS] 
    Running test: testSimpleVectorAddition   ................  [PASS] 
    Running test: testVectorHalf2            ................  [PASS] 
    Running test: testVectorHalf3            ................  [PASS] 
    Running test: testVectorFloat3toString   ................  [PASS] 
    Running test: testVectorHalf4            ................  [PASS] 
    Running test: testVectorHalf16           ................  [PASS] 
    Running test: testVectorHalf8            ................  [PASS] 
    Running test: testVectorHalf8_Storage    ................  [PASS] 
    Running test: testDotProduct             ................  [PASS] 
    Running test: privateVectorHalf2         ................  [PASS] 
    Running test: privateVectorHalf4         ................  [PASS] 
    Running test: privateVectorHalf8         ................  [PASS] 
    Running test: testVectorHalf4_Unary      ................  [PASS] 
    Running test: testInternalSetMethod01    ................  [PASS] 
    Running test: testInternalSetMethod02    ................  [PASS] 
    Running test: testInternalSetMethod03    ................  [PASS] 
    Running test: testInternalSetMethod04    ................  [PASS] 
    Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0