beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.17k stars 110 forks source link

Introduce Tensor API, Tensor Utilities and compatiblity with ONNX RT #369

Closed mikepapadim closed 4 months ago

mikepapadim commented 5 months ago

Description

This pull request introduces a new Tensor API v0.1, providing a comprehensive set of classes and utilities for working with tensors of various data types along with compatibility with ONNX RT Java bindings.

The primary changes include:

Also, this PR adds test to show compatibility with ONNX RT Java bindings->

  public void testOnnxCompatibility() throws OrtException {
        Shape shape = new Shape(1, 3, 224, 224);
        TensorFloat32 tornadoTensor = new TensorFloat32(shape);

        tornadoTensor.init(2f);

        OnnxTensor outputTensor = null;

        try (OrtEnvironment env = OrtEnvironment.getEnvironment()) {
            // Load the MobileNet V2 ONNX model
            OrtSession session = env.createSession(MODEL_PATH, new OrtSession.SessionOptions());

            OnnxTensor inputTensor = OnnxTensor.createTensor(env, tornadoTensor.getFloatBuffer(), shape.dimensions());

            Map<String, OnnxTensor> inputMap = new HashMap<>();
            inputMap.put(INPUT_TENSOR_NAME, inputTensor);

            // Run the model inference
            try (OrtSession.Result outputMap = session.run(inputMap)) {
                Optional<OnnxValue> optionalOutputTensor = outputMap.get(OUTPUT_TENSOR_NAME);
                if (optionalOutputTensor.isEmpty()) {
                    throw new IllegalArgumentException("Output tensor not found in model output.");
                }
                outputTensor = (OnnxTensor) optionalOutputTensor.get();

            }
        } finally {
            Assert.assertNotNull(outputTensor);
        }
    }

Overview of the Tensor API Architecture: TensorInt64

Notes: It adds a dependency to ONNX runtime in the tornado-unittests packages, as an alternative we could move the compatibility tests into a seperate repo or submodule.

Backend/s tested

Mark the backends affected by this PR.

OS tested

Mark the OS where this PR is tested.

How to test the new patch?

Provide instructions about how to test the new patch.


make jdk21 backends=opencl

# Test how to use Tornado Tensor Types with ONNX RT
tornado-test -V --fast uk.ac.manchester.tornado.unittests.tensors.TestTensorAPIWithOnnx

# Test Tensor Types for all the supported DTypes
tornado-test -V --fast uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes

hannibalhuang commented 4 months ago

quick question, is there a document describing the use cases for ONNX RT support for TornadoVM hopes to achieve ?

mikepapadim commented 4 months ago

quick question, is there a document describing the use cases for ONNX RT support for TornadoVM hopes to achieve ?

Hello @hannibalhuang. we plan to use for use cases for which data pre-processing is required, such as models used for image classification.

mikepapadim commented 4 months ago

This is now ready for another iteration.

The following PR comments were addressed:

1) I refactored the tensors classes to inherit from Tensor class which is only permitted in TornadoNativeArray 2) Renamed fp tensors 3) Download onnx models on the fly 4) add javadocs in shape class methods 5) add factory methods for initiliaze 6) add methods to retun tensor segments as bytebuffers

jjfumero commented 4 months ago

Is this PR ready for a second review?

mikepapadim commented 4 months ago

Is this PR ready for a second review?

Yes

stratika commented 4 months ago

I have a problem when running the following tests with OpenCL:

tornado-test -V --fast uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes"
WARNING: Using incubator modules: jdk.incubator.vector
[ERROR] clEnqueueReadBuffer, code = -5 n[TornadoVM-OCL-JNI] ERROR : clEnqueueReadBuffer -> Returned: [JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> -5CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on NVIDIA RTX A2000 8GB Laptop GPU (Device 0).

[TornadoVM-OCL-JNI] ERROR : clFlush -> Returned: -9999
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> Unknown error executing clFlush on NVIDIA RTX A2000 8GB Laptop GPU (Device 0).

[TornadoVM-OCL-JNI] ERROR : clFinish -> Returned: -36
[TornadoVM-OCL-JNI] ERROR : clEnqueueWriteBuffer -> Returned: -5
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on NVIDIA RTX A2000 8GB Laptop GPU (Device 0).

Do they work for you?

jjfumero commented 4 months ago

No. Same error in my configuration (OpenCL)

tornado-test -V --fast uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes 
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes"
WARNING: Using incubator modules: jdk.incubator.vector
[ERROR] clEnqueueReadBuffer, code = -5 n[TornadoVM-OCL-JNI] ERROR : clEnqueueReadBuffer -> Returned: [JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> -5CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on NVIDIA GeForce RTX 3070 (Device 0).

[TornadoVM-OCL-JNI] ERROR : clFlush -> Returned: -9999
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> Unknown error executing clFlush on NVIDIA GeForce RTX 3070 (Device 0).

[TornadoVM-OCL-JNI] ERROR : clFinish -> Returned: -36
[TornadoVM-OCL-JNI] ERROR : clEnqueueWriteBuffer -> Returned: [JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on NVIDIA GeForce RTX 3070 (Device 0).
jjfumero commented 4 months ago

This is related to offsets and sizes when moving data from the host to the device.

mikepapadim commented 4 months ago

if i run the tests individually, they are passing:

tornado-test --printKernel -V --fast uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes#testTensorInt64Add
mikepapadim commented 4 months ago

Now, it is fixed

jjfumero commented 4 months ago

SPIR-V backend:

tornado-test -V --fast uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes
tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=True "  -m  tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner  --params "uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes"
WARNING: Using incubator modules: jdk.incubator.vector
Test: class uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes
    Running test: testHelloTensorAPI         ................  [PASS] 
    Running test: testTensorFloat16Add       ................  [FAILED] 
        \_[REASON] expected:<0.0> but was:<5.0>
    Running test: testTensorFloat32Add       ................  [FAILED] 
        \_[REASON] expected:<0.0> but was:<3000.0>
    Running test: testTensorFloat64Add       ................  [FP64 UNSUPPORTED FOR CURRENT DEVICE] 
    Running test: testTensorInt16Add         ................  [FAILED] 
        \_[REASON] expected:<0.0> but was:<300.0>
    Running test: testTensorInt32Add         ................  [FAILED] 
        \_[REASON] expected:<0.0> but was:<300.0>
    Running test: testTensorInt64Add         ................  [FAILED] 
        \_[REASON] expected:<0.0> but was:<3000.0>
    Running test: testTensorByte             ................  [FAILED] 
        \_[REASON] expected:<0.0> but was:<44.0>

I am using the Intel compute runtime: 24.09.28717.12

jjfumero commented 4 months ago

Pending:

mikepapadim commented 4 months ago

Pending: fix for FP16

mikepapadim commented 4 months ago

Pending: fix for FP16

this is fixed now