Open Benco11-developement opened 2 weeks ago
Thanks for the report. At a first look, this code cannot be parallelized:
public static void unsignedByte(ByteArray a, IntArray result) {
for(@Parallel int i = 0; i < 3; i++) {
result.set(0, result.get(0) | ((a.get(i) & 0xFF) << (8*i))); << Shared position across **all threads**: it needs a blocking access, The TornadoVM Parallel API does not offer barriers, but the TornadoVM Kernel API does. Besides, TornadoVM supports atomics for the OpenCL backend.
result.set(i+1, a.get(i) & 0xFF);
}
}
Note that TornadoVM does not solve data dependencies. It takes annotations as hints to parallelize. It is up to the user to ensure those regions can be parallelized. This is a similar concept to OpenMP, or OpenACC.
Thanks for your response. I modified the test case to make it parallelizable and added a control group to verify if the code was executed correctly :
import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
import uk.ac.manchester.tornado.api.TaskGraph;
import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;
import uk.ac.manchester.tornado.api.exceptions.TornadoExecutionPlanException;
import uk.ac.manchester.tornado.api.types.arrays.ByteArray;
import uk.ac.manchester.tornado.api.types.arrays.IntArray;
import java.util.Arrays;
import java.util.Random;
public class Main {
public static void main(String[] args) throws TornadoExecutionPlanException {
Random r = new Random();
int size = 32;
ByteArray a = new ByteArray(size);
// First half of input data includes bytes using all 8 bits, last half includes bytes using less than 8 bits
for(int i = 0; i < size/2; i++) {
a.set(i, (byte) (128 + r.nextInt(128)));
a.set(i+size/2, (byte) (r.nextInt(128)));
}
IntArray theoreticalResult = new IntArray(size);
IntArray theoreticalControlResult = new IntArray(size);
IntArray actualResult = new IntArray(size);
IntArray actualControlResult = new IntArray(size);
theoreticalResult.init(0);
theoreticalControlResult.init(0);
actualResult.init(0);
actualControlResult.init(0);
TaskGraph graph = new TaskGraph("s0")
.transferToDevice(DataTransferMode.FIRST_EXECUTION, a, actualResult, actualControlResult)
.task("t0", Main::unsignedByte, a, actualResult, actualControlResult, size)
.transferToHost(DataTransferMode.EVERY_EXECUTION, actualResult, actualControlResult);
ImmutableTaskGraph immutableTaskGraph = graph.snapshot();
try(TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
executionPlan.execute();
unsignedByte(a, theoreticalResult, theoreticalControlResult, size);
}
if(Arrays.equals(theoreticalControlResult.toHeapArray(), actualControlResult.toHeapArray())) {
System.out.println("Executed successfully");
} else {
System.out.println("Error during execution");
}
for(int i = 1; i < size; i++) {
System.out.println("Expected byte " + i + " : " + Integer.toBinaryString(theoreticalResult.get(i-1)) + "\nFound : " + Integer.toBinaryString(actualResult.get(i-1)));
}
}
public static void unsignedByte(ByteArray a, IntArray result, IntArray controlResult, int size) {
for(@Parallel int i = 0; i < size; i++) {
result.set(i, a.get(i) & 0xFF);
controlResult.set(i, 2 * a.get(i));
}
}
}
Here is the output :
Executed successfully
Expected byte 1 : 10110011
Found : 11111111111111111111111110110011
Expected byte 2 : 10011100
Found : 11111111111111111111111110011100
Expected byte 3 : 10011100
Found : 11111111111111111111111110011100
Expected byte 4 : 11100110
Found : 11111111111111111111111111100110
Expected byte 5 : 11000000
Found : 11111111111111111111111111000000
Expected byte 6 : 10010100
Found : 11111111111111111111111110010100
Expected byte 7 : 10111010
Found : 11111111111111111111111110111010
Expected byte 8 : 11101000
Found : 11111111111111111111111111101000
Expected byte 9 : 10000101
Found : 11111111111111111111111110000101
Expected byte 10 : 11110111
Found : 11111111111111111111111111110111
Expected byte 11 : 11111011
Found : 11111111111111111111111111111011
Expected byte 12 : 11000010
Found : 11111111111111111111111111000010
Expected byte 13 : 10010111
Found : 11111111111111111111111110010111
Expected byte 14 : 11011110
Found : 11111111111111111111111111011110
Expected byte 15 : 10000001
Found : 11111111111111111111111110000001
Expected byte 16 : 10111011
Found : 11111111111111111111111110111011
Expected byte 17 : 1000101
Found : 1000101
Expected byte 18 : 110010
Found : 110010
Expected byte 19 : 1100011
Found : 1100011
Expected byte 20 : 100110
Found : 100110
Expected byte 21 : 1110011
Found : 1110011
Expected byte 22 : 1111010
Found : 1111010
Expected byte 23 : 111110
Found : 111110
Expected byte 24 : 1111
Found : 1111
Expected byte 25 : 1010101
Found : 1010101
Expected byte 26 : 11110
Found : 11110
Expected byte 27 : 1100101
Found : 1100101
Expected byte 28 : 1111100
Found : 1111100
Expected byte 29 : 111100
Found : 111100
Expected byte 30 : 1111011
Found : 1111011
Expected byte 31 : 111001
Found : 111001
As you can see, the bytes aren't unsigned well by the & 0xFF
.
Ok, thanks for the update. I would need to analyse the generated code as well as if there any compiler phase that changes this. We will take a look.
Describe the bug
Unsigning bytes somehow doesn't work like it should normally.
How To Reproduce
Here is a test case where we make a 24-bit unsigned int from 3 bytes :
Result :
Expected behavior
Bytes should be normally unsigned thanks to the
& 0xFF
Computing system setup (please complete the following information):
Additional context
Changing the backend does not help