beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.19k stars 114 forks source link

TornadoVM stuck when handling the exceptions #240

Open TaisZ opened 1 year ago

TaisZ commented 1 year ago

Describe the bug

When running the following code, TornadoVM cannot properly throw an exception or stop the program.

How To Reproduce

package uk.ac.manchester.tornado.examples.Broken;

import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
import uk.ac.manchester.tornado.api.TaskGraph;
import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;

import java.util.Arrays;

/**
 * Running do not stop, when add the line 18. DEADLOCK?
 */
public class TryCatch {
    public static void divided(int[] a, int[] b, float[] c) {
        try {
            for (@Parallel int i = 0; i < c.length; i++) {
                System.out.println(b[i]);
                c[i] = (float) a[i] /b[i];
            }
        }catch (Exception e){
            System.out.println("Got an error!");
        }

    }

    public static void main(String[] args) {

        final int numElements = 8;
        int[] a = new int[numElements];
        int[] b = new int[numElements];
        float[] c = new float[numElements];

        Arrays.fill(a, 5);
        Arrays.fill(b, 0);
        Arrays.fill(c, 0);

        TaskGraph taskGraph = new TaskGraph("s0") //
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b) //
                .task("t0", TryCatch::divided, a, b, c) //
                .transferToHost(DataTransferMode.EVERY_EXECUTION, c);

        ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
        TornadoExecutionPlan executor = new TornadoExecutionPlan(immutableTaskGraph);
        executor.execute();
    }
}

Expected behavior

Since there are no such mechanisms on GPUs that handle the exceptions, for example, on a division by 0. The TornadoVM may run the code on the CPU or give some error messages.

Computing system setup (please complete the following information):

TaisZ commented 1 year ago

When I removed the line: System.out.println(b[i]);. The TornadoVM can stop the program properly but no output or error message.

stratika commented 1 year ago

Hi @TaisZ, I think your example is by default not supported. A method that will be offloaded for hardware acceleration (e.g. divided) cannot contain traps or exceptions, since this is an operation not supported by the hardware accelerators. See paragraph 4, in this document.

jjfumero commented 1 year ago

This type of code is not supported in TornadoVM. However, TornadoVM should throw an exception. The problem now seems that the execution hangs and it gets stuck, probably during the initial part of the TornadoVM JIT Compiler.

@TaisZ , can you run with --debug and --fullDebug options and report the output you get? These will dump a trace of the internals of the JIT compiler and the runtime system.

TaisZ commented 1 year ago

@jjfumero The output:

$ tornado --jvm="-Dtornado.fullDebug=True"  --debug --threadInfo --printKernel -m tornado.examples/uk.ac.manchester.tornado.examples.Broken.TryCatch

WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Loading DRIVER: uk.ac.manchester.tornado.drivers.ptx.PTXTornadoDriverProvider@57f23557
TaisZ commented 1 year ago

@jjfumero However when I remove the line System.out.println(b[i]);, the TornadoVM can stop the program properly without exception.

$ tornado --jvm="-Dtornado.fullDebug=True"  --debug --threadInfo --printKernel -m tornado.examples/uk.ac.manchester.tornado.examples.Broken.TryCatch
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Loading DRIVER: uk.ac.manchester.tornado.drivers.ptx.PTXTornadoDriverProvider@57f23557
.version 7.6 
.target sm_86 
.address_size 64 

.visible .entry s0_t0_divided_int8_int8_float8(.param .u64 .ptr .global .align 8 kernel_context, .param .u64 .ptr .global .align 8 a, .param .u64 .ptr .global .align 8 b, .param .u64 .ptr .global .align 8 c) {
        .reg .pred rpb<2>;
        .reg .s64 rsd<4>;
        .reg .f32 rfi<4>;
        .reg .u32 rui<5>;
        .reg .s32 rsi<7>;
        .reg .u64 rud<9>;

BLOCK_0:
        ld.param.u64    rud0, [kernel_context];
        ld.param.u64    rud1, [a];
        ld.param.u64    rud2, [b];
        ld.param.u64    rud3, [c];
        mov.u32 rui0, %nctaid.x;
        mov.u32 rui1, %ntid.x;
        mul.wide.u32    rud4, rui0, rui1;
        cvt.s32.u64     rsi0, rud4;
        mov.u32 rui2, %tid.x;
        mov.u32 rui3, %ctaid.x;
        mad.lo.s32      rsi1, rui3, rui1, rui2;

BLOCK_1:
        mov.s32 rsi2, rsi1;
LOOP_COND_1:
        setp.lt.s32     rpb0, rsi2, 8;
        @!rpb0 bra      BLOCK_3;

BLOCK_2:
        cvt.s64.s32     rsd0, rsi2;
        shl.b64 rsd1, rsd0, 2;
        add.s64 rsd2, rsd1, 24;
        add.u64 rud5, rud1, rsd2;
        ld.global.s32   rsi3, [rud5];
        add.u64 rud6, rud2, rsd2;
        ld.global.s32   rsi4, [rud6];
        add.u64 rud7, rud3, rsd2;
        cvt.rn.f32.s32  rfi0, rsi3;
        cvt.rn.f32.s32  rfi1, rsi4;
        div.full.f32    rfi2, rfi0, rfi1;
        st.global.f32   [rud7], rfi2;
        add.s32 rsi5, rsi0, rsi2;
        mov.s32 rsi2, rsi5;
        bra.uni LOOP_COND_1;

BLOCK_3:
        ret;
}

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3080 GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]
jjfumero commented 1 year ago

@TaisZ , can you use the latest version from the master branch? We fixed the full debug mode in a recent commit.

TaisZ commented 1 year ago

Yes, all runs are based on the latest commit

$ tornado --fullDebug --debug -m tornado.examples/uk.ac.manchester.tornado.examples.Broken.TryCatch

WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
Loading DRIVER: uk.ac.manchester.tornado.drivers.ptx.PTXTornadoDriverProvider@57f23557