beehive-lab / TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
https://www.tornadovm.org
Apache License 2.0
1.15k stars 109 forks source link

Unsigned values dysfunction #457

Open Benco11-developement opened 2 weeks ago

Benco11-developement commented 2 weeks ago

Describe the bug

Unsigning bytes somehow doesn't work like it should normally.

How To Reproduce

Here is a test case where we make a 24-bit unsigned int from 3 bytes :

import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
import uk.ac.manchester.tornado.api.TaskGraph;
import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;
import uk.ac.manchester.tornado.api.exceptions.TornadoExecutionPlanException;
import uk.ac.manchester.tornado.api.types.arrays.ByteArray;
import uk.ac.manchester.tornado.api.types.arrays.IntArray;

public class Main {

    public static void main(String[] args) throws TornadoExecutionPlanException {
        int byte1 = Integer.parseInt("10011100", 2); // Store the byte as an int because the parse function expects a signed input but the binary number provided can't fit into a signed byte
        int byte2 = Integer.parseInt("00100111", 2);
        int byte3 = Integer.parseInt("11001001", 2);

        ByteArray a = new ByteArray(3);
        a.set(0, (byte) byte3);
        a.set(1, (byte) byte2);
        a.set(2, (byte) byte1);

        IntArray theoreticalResult = new IntArray(4); // Executed normally
        IntArray actualResult = new IntArray(4); // Executed with Tornado
        theoreticalResult.set(0, 0);
        actualResult.set(0, 0);

        TaskGraph graph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, actualResult)
                .task("t0", Main::unsignedByte, a, actualResult)
                .transferToHost(DataTransferMode.EVERY_EXECUTION, actualResult);

        ImmutableTaskGraph immutableTaskGraph = graph.snapshot();
        try(TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
            executionPlan.execute();
            unsignedByte(a, theoreticalResult);
        }

        System.out.println("Expected : " + Integer.toBinaryString(theoreticalResult.get(0)) + "\nFound : " + Integer.toBinaryString(actualResult.get(0)));
        for(int i = 1; i < 4; i++) {
            System.out.println("Expected byte " + i + " : " + Integer.toBinaryString(theoreticalResult.get(4-i)) + "\nFound : " + Integer.toBinaryString(actualResult.get(4-i)));
        }
    }

    public static void unsignedByte(ByteArray a, IntArray result) {
        for(@Parallel int i = 0; i < 3; i++) {
            result.set(0, result.get(0) | ((a.get(i) & 0xFF) << (8*i)));
            result.set(i+1, a.get(i) & 0xFF);
        }
    }
}

Result :

Expected : 100111000010011111001001
Found : 11111111100111000000000000000000
Expected byte 1 : 10011100
Found : 11111111111111111111111110011100
Expected byte 2 : 100111
Found : 100111
Expected byte 3 : 11001001
Found : 11111111111111111111111111001001

Expected behavior

Bytes should be normally unsigned thanks to the & 0xFF

Computing system setup (please complete the following information):

Additional context

Changing the backend does not help


jjfumero commented 2 weeks ago

Thanks for the report. At a first look, this code cannot be parallelized:

 public static void unsignedByte(ByteArray a, IntArray result) {
        for(@Parallel int i = 0; i < 3; i++) {
            result.set(0, result.get(0) | ((a.get(i) & 0xFF) << (8*i)));   << Shared position across **all threads**: it needs a blocking access, The TornadoVM Parallel API does not offer barriers, but the TornadoVM Kernel API does. Besides, TornadoVM supports atomics for the OpenCL backend. 
            result.set(i+1, a.get(i) & 0xFF);                                   
        }
 }   

Note that TornadoVM does not solve data dependencies. It takes annotations as hints to parallelize. It is up to the user to ensure those regions can be parallelized. This is a similar concept to OpenMP, or OpenACC.

Benco11-developement commented 2 weeks ago

Thanks for your response. I modified the test case to make it parallelizable and added a control group to verify if the code was executed correctly :

import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
import uk.ac.manchester.tornado.api.TaskGraph;
import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;
import uk.ac.manchester.tornado.api.exceptions.TornadoExecutionPlanException;
import uk.ac.manchester.tornado.api.types.arrays.ByteArray;
import uk.ac.manchester.tornado.api.types.arrays.IntArray;

import java.util.Arrays;
import java.util.Random;

public class Main {

    public static void main(String[] args) throws TornadoExecutionPlanException {
        Random r = new Random();
        int size = 32;

        ByteArray a = new ByteArray(size);

        // First half of input data includes bytes using all 8 bits, last half includes bytes using less than 8 bits
        for(int i = 0; i < size/2; i++) {
            a.set(i, (byte) (128 + r.nextInt(128)));
            a.set(i+size/2, (byte) (r.nextInt(128)));
        }

        IntArray theoreticalResult = new IntArray(size);
        IntArray theoreticalControlResult = new IntArray(size);

        IntArray actualResult = new IntArray(size);
        IntArray actualControlResult = new IntArray(size);

        theoreticalResult.init(0);
        theoreticalControlResult.init(0);

        actualResult.init(0);
        actualControlResult.init(0);

        TaskGraph graph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, actualResult, actualControlResult)
                .task("t0", Main::unsignedByte, a, actualResult, actualControlResult, size)
                .transferToHost(DataTransferMode.EVERY_EXECUTION, actualResult, actualControlResult);

        ImmutableTaskGraph immutableTaskGraph = graph.snapshot();
        try(TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
            executionPlan.execute();
            unsignedByte(a, theoreticalResult, theoreticalControlResult, size);
        }

        if(Arrays.equals(theoreticalControlResult.toHeapArray(), actualControlResult.toHeapArray())) {
            System.out.println("Executed successfully");
        } else {
            System.out.println("Error during execution");
        }

        for(int i = 1; i < size; i++) {
            System.out.println("Expected byte " + i + " : " + Integer.toBinaryString(theoreticalResult.get(i-1)) + "\nFound : " + Integer.toBinaryString(actualResult.get(i-1)));
        }
    }

    public static void unsignedByte(ByteArray a, IntArray result, IntArray controlResult, int size) {
        for(@Parallel int i = 0; i < size; i++) {
            result.set(i, a.get(i) & 0xFF);
            controlResult.set(i, 2 * a.get(i));
        }
    }
}

Here is the output :

Executed successfully
Expected byte 1 : 10110011
Found : 11111111111111111111111110110011
Expected byte 2 : 10011100
Found : 11111111111111111111111110011100
Expected byte 3 : 10011100
Found : 11111111111111111111111110011100
Expected byte 4 : 11100110
Found : 11111111111111111111111111100110
Expected byte 5 : 11000000
Found : 11111111111111111111111111000000
Expected byte 6 : 10010100
Found : 11111111111111111111111110010100
Expected byte 7 : 10111010
Found : 11111111111111111111111110111010
Expected byte 8 : 11101000
Found : 11111111111111111111111111101000
Expected byte 9 : 10000101
Found : 11111111111111111111111110000101
Expected byte 10 : 11110111
Found : 11111111111111111111111111110111
Expected byte 11 : 11111011
Found : 11111111111111111111111111111011
Expected byte 12 : 11000010
Found : 11111111111111111111111111000010
Expected byte 13 : 10010111
Found : 11111111111111111111111110010111
Expected byte 14 : 11011110
Found : 11111111111111111111111111011110
Expected byte 15 : 10000001
Found : 11111111111111111111111110000001
Expected byte 16 : 10111011
Found : 11111111111111111111111110111011
Expected byte 17 : 1000101
Found : 1000101
Expected byte 18 : 110010
Found : 110010
Expected byte 19 : 1100011
Found : 1100011
Expected byte 20 : 100110
Found : 100110
Expected byte 21 : 1110011
Found : 1110011
Expected byte 22 : 1111010
Found : 1111010
Expected byte 23 : 111110
Found : 111110
Expected byte 24 : 1111
Found : 1111
Expected byte 25 : 1010101
Found : 1010101
Expected byte 26 : 11110
Found : 11110
Expected byte 27 : 1100101
Found : 1100101
Expected byte 28 : 1111100
Found : 1111100
Expected byte 29 : 111100
Found : 111100
Expected byte 30 : 1111011
Found : 1111011
Expected byte 31 : 111001
Found : 111001

As you can see, the bytes aren't unsigned well by the & 0xFF.

jjfumero commented 2 weeks ago

Ok, thanks for the update. I would need to analyse the generated code as well as if there any compiler phase that changes this. We will take a look.