beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.18k stars 113 forks source link

Cannot read the array length because "this.code" is null #236

Closed bitstuffing closed 1 year ago

bitstuffing commented 1 year ago

Describe the bug

Multiplying a matrix of Complex[][] and double[][] with CPU works fine, but trying with GPU launches a this.code == null exception

How To Reproduce

A jar file has been generated a maven test project with a main class, and included all dependencies with maven-assembly-plugin and compiled with maven-compiler-plugin at 16 java version.

Dependencies are included like documentation says:

        <dependency>
            <groupId>tornado</groupId>
            <artifactId>tornado-api</artifactId>
            <version>0.15</version>
        </dependency>
        <dependency>
            <groupId>tornado</groupId>
            <artifactId>tornado-matrices</artifactId>
            <version>0.15</version>
        </dependency>

command to reproduce:

$ tornado --debug --jvm="-Dtornado.fullDebug=true" -jar target/tornadovm-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar

throws this output:

tornado --debug --jvm="-Dtornado.fullDebug=true" -jar target/tornadovm-test-0.0.1-SNAPSHOT-jar-with-dependencies.jar  --debug
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
CPU execution done!
Finished at: 14 ms.
testing on GPU:
Loading DRIVER: uk.ac.manchester.tornado.drivers.opencl.OCLTornadoDriverProvider@64d2d351
java.lang.NullPointerException: Cannot read the array length because "this.code" is null
        at jdk.internal.vm.compiler/org.graalvm.compiler.bytecode.BytecodeStream.setBCI(BytecodeStream.java:209)
        at jdk.internal.vm.compiler/org.graalvm.compiler.bytecode.BytecodeStream.<init>(BytecodeStream.java:47)
        at jdk.internal.vm.compiler/org.graalvm.compiler.java.BytecodeParser.<init>(BytecodeParser.java:951)
        at jdk.internal.vm.compiler/org.graalvm.compiler.java.GraphBuilderPhase$Instance.createBytecodeParser(GraphBuilderPhase.java:102)
        at jdk.internal.vm.compiler/org.graalvm.compiler.java.GraphBuilderPhase$Instance.run(GraphBuilderPhase.java:97)
        at jdk.internal.vm.compiler/org.graalvm.compiler.java.GraphBuilderPhase.run(GraphBuilderPhase.java:63)
        at jdk.internal.vm.compiler/org.graalvm.compiler.java.GraphBuilderPhase.run(GraphBuilderPhase.java:43)
        at jdk.internal.vm.compiler/org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:446)
        at jdk.internal.vm.compiler/org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:334)
        at jdk.internal.vm.compiler/org.graalvm.compiler.phases.PhaseSuite.run(PhaseSuite.java:390)
        at jdk.internal.vm.compiler/org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:446)
        at jdk.internal.vm.compiler/org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:334)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:220)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:186)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:176)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
java.util.concurrent.ExecutionException: uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: fillInStackTrace(Cannot read the array length because "this.code" is null)
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.lookup(TornadoSketcher.java:154)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.lambda$buildSketch$2(TornadoSketcher.java:243)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:241)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:186)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:176)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: fillInStackTrace(Cannot read the array length because "this.code" is null)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:254)
        ... 6 more
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: fillInStackTrace(Cannot read the array length because "this.code" is null)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:254)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:186)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:176)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
java.util.concurrent.ExecutionException: uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: multiply(Unable to build sketch for method: fillInStackTrace(Cannot read the array length because "this.code" is null))
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.lookup(TornadoSketcher.java:154)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.addInner(TornadoTaskGraph.java:573)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.addInner(TornadoTaskGraph.java:1913)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.addTask(TornadoTaskGraph.java:2012)
        at tornado.api@0.15.1-dev/uk.ac.manchester.tornado.api.TaskGraph.task(TaskGraph.java:167)
        at com.MainApplication.main(MainApplication.java:87)
Caused by: uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: multiply(Unable to build sketch for method: fillInStackTrace(Cannot read the array length because "this.code" is null))
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:254)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:186)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:176)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to build sketch for method: multiply(Unable to build sketch for method: fillInStackTrace(Cannot read the array length because "this.code" is null))
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher.buildSketch(TornadoSketcher.java:254)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:186)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.sketcher.TornadoSketcher$TornadoSketcherCallable.call(TornadoSketcher.java:176)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 256
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.graph.TornadoGraph.getNode(TornadoGraph.java:50)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.graph.TornadoGraphBuilder.buildGraph(TornadoGraphBuilder.java:231)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.compile(TornadoTaskGraph.java:628)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.compileToTornadoVMBytecode(TornadoTaskGraph.java:697)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.scheduleInner(TornadoTaskGraph.java:793)
        at tornado.runtime@0.15.1-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph.schedule(TornadoTaskGraph.java:1203)
        at tornado.api@0.15.1-dev/uk.ac.manchester.tornado.api.TaskGraph.execute(TaskGraph.java:782)
        at tornado.api@0.15.1-dev/uk.ac.manchester.tornado.api.ImmutableTaskGraph.execute(ImmutableTaskGraph.java:73)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
        at tornado.api@0.15.1-dev/uk.ac.manchester.tornado.api.TornadoExecutionPlan$TornadoExecutor.execute(TornadoExecutionPlan.java:307)
        at tornado.api@0.15.1-dev/uk.ac.manchester.tornado.api.TornadoExecutionPlan.execute(TornadoExecutionPlan.java:126)
        at com.MainApplication.main(MainApplication.java:94)
cleanup: programs  ..........0.000001621 s
cleanup: queues    ..........0.000100366 s
cleanup: context   ..........0.000014057 s
cleanup: total     ..........0.000116044 s

The code, mainclass has the following directly in main method:

    long start = System.currentTimeMillis();
    //multiply
    Complex[][] result = multiply(initial, ans_ref); //3x16763
    long end = System.currentTimeMillis();

    int x=0;
    for(Complex[] resultArray:result) {
        int y=0;
        for(Complex resultElement:resultArray) { //-0.09584801112731389 != -0.09584801112731392
            if(resultElement.im() - expectedResult[x][y].im()>1e-15) {
                System.out.println("Fail"); 
                break;
            }
            y++;
        }
        x++;
    }
    System.out.println("CPU execution done!");
    System.out.println("Finished at: "+((end-start))+" ms.");

    System.out.println("testing on GPU:");
        TaskGraph taskGraph = new TaskGraph("s0") //
           .transferToDevice(DataTransferMode.FIRST_EXECUTION, initial, ans_ref) //
           .task("t0", MainApplication::multiply, initial, ans_ref) //
           .transferToHost(DataTransferMode.EVERY_EXECUTION, result);

        ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph);

        start = System.currentTimeMillis();
        TornadoExecutionResult results = executionPlan.execute();
        end = System.currentTimeMillis();

        System.out.println("GPU execution done!");
        System.out.println("Finished at: "+((end-start))+" ms.");

and multiply method with @Parallel:

public static Complex[][] multiply(double[][] matrix1, Complex[][] matrix2) {
        int rows1 = matrix1.length;
        int cols1 = matrix1[0].length;
        int rows2 = matrix2.length;
        int cols2 = matrix2[0].length;
        if (cols1 != rows2) {
            throw new IllegalArgumentException("Sizes are not valid, you're NOT able to be multiplied");
        }
        Complex[][] result = new Complex[rows1][cols2];
        for (@Parallel int i = 0; i < rows1; i++) {
            for (@Parallel int j = 0; j < cols2; j++) {
                double real = 0.0;
                double imagin = 0.0;
                for (@Parallel int k = 0; k < rows2; k++) {
                    double real1 = matrix1[i][k];
                    double real2 = matrix2[k][j].re();
                    double imagin2 = matrix2[k][j].im();
                    real += real1 * real2;
                    imagin += real1 * imagin2;
                }
                result[i][j] = new Complex(real, imagin);
            }
        }
        return result;
    }

Expected behavior

Accelerated results with an Y round to 1 or 0, similar to this sample output:

CPU execution done!
Finished at: 14 ms.
testing on GPU: 
Finished at: Y ms.

Computing system setup (please complete the following information):

Additional context

MatrixMultiplication2D default tests runs well on GPU, so TornadoVM works fine with OpenCL.

Backends installed:


jjfumero commented 1 year ago

Hi @bitstuffing. Thank you for the feeback. I see a couple of things:

1) 2D and 3D arrays are not supported.

TornadoVM provides data structures to perform 2D and 3D operations:

See for example Matrix2DFloat: https://github.com/beehive-lab/TornadoVM/blob/master/tornado-api/src/main/java/uk/ac/manchester/tornado/api/collections/types/Matrix2DFloat.java

Examples: https://github.com/beehive-lab/TornadoVM/blob/master/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java

You can use these data structures in your kernels.

2) Kernels in TornadoVM should return void.

The reason is that, the Java method will be compiled to OpenCL/PTX and SPIR-V parallel kernels. The kernel represents the code to be executed per native thread (e.g., by an OpenCL work-item). Thus, if TornadoVM returns objects, it should keep also a match between the ordering of threads and the corresponding output. To avoid this. TornadoVM forces the programmer to also pass as parameters the return objects.

public static void multiply(Matrix2DFloat matrix1, Matrix2DFloat matrix2, Matrix2DFloat complex) { 
    ... 
}

Additionally, dynamic object allocation is not supported in TornadoVM. Only certain types are allowed. See the examples module in TornadoVM.

https://github.com/beehive-lab/TornadoVM/tree/master/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples

bitstuffing commented 1 year ago

Thank you so much for your indications. Now I have a success code and I was able to convert it.

imagen

So the bug was mine, sorry. Thanks for the support.

But like you can see, it's not very optimal, because I want to multiply a double[][] with complex[][], and decompose a Complex in two Matrix2DFloats is painful for performance reasons.

Do you have some suggestion for Complex numbers to avoid this?

Thanks in advance for your work, it's a great project.

Edit (my test code function):

public static void multiply(Matrix2DFloat matrix1, Matrix2DFloat matrix2_real, Matrix2DFloat matrix2_imag, Matrix2DFloat complex_real, Matrix2DFloat complex_imag) {
        int N = matrix1.getNumRows();
        int M = matrix1.getNumColumns();
        int K = matrix2_real.getNumColumns();
        for (@Parallel int i = 0; i < N; i++) {
            for (@Parallel int j = 0; j < K; j++) {
                float sum_real = 0;
                float sum_imag = 0;
                for (@Parallel int k = 0; k < M; k++) {
                    float a_real = matrix1.get(i, k);
                    float b_real = matrix2_real.get(k, j);
                    float a_imag = 0;
                    float b_imag = matrix2_imag.get(k, j);
                    sum_real += a_real * b_real - a_imag * b_imag;
                    sum_imag += a_real * b_imag + a_imag * b_real;
                }
                complex_real.set(i, j, sum_real);
                complex_imag.set(i, j, sum_imag);
            }
        }
    }
jjfumero commented 1 year ago

So, as we have now, I understand that there is a performance penalty to marshall the object from your Complex type to TornadoVM types. I think it makes sense for us to support Complex values directly, so developers won't have to do this marshalling.

We will discuss this internally with our team and let you know. We are currently designing new types so that we can consider this for future versions.

jjfumero commented 1 year ago

Does the new version you provided work? Did you encounter new issues?

bitstuffing commented 1 year ago

Does the new version you provided work? Did you encounter new issues?

Yes, last code works with the times that I mentioned. I could offer you my test values if you consider it, the matrixes are not regulars, there is an input with: matrix1 //3x4 double[][] matrix2 //4x16763 Complex[][] expected_complex_result //3x16763 Complex[][]

jjfumero commented 1 year ago

I am closing this issue. Feel free to open new issues for new feedback or new problems.