Closed bitstuffing closed 1 year ago
Hi @bitstuffing. Thank you for the feeback. I see a couple of things:
TornadoVM provides data structures to perform 2D and 3D operations:
See for example Matrix2DFloat
: https://github.com/beehive-lab/TornadoVM/blob/master/tornado-api/src/main/java/uk/ac/manchester/tornado/api/collections/types/Matrix2DFloat.java
You can use these data structures in your kernels.
void
.The reason is that, the Java method will be compiled to OpenCL/PTX and SPIR-V parallel kernels. The kernel represents the code to be executed per native thread (e.g., by an OpenCL work-item). Thus, if TornadoVM returns objects, it should keep also a match between the ordering of threads and the corresponding output. To avoid this. TornadoVM forces the programmer to also pass as parameters the return objects.
public static void multiply(Matrix2DFloat matrix1, Matrix2DFloat matrix2, Matrix2DFloat complex) {
...
}
Additionally, dynamic object allocation is not supported in TornadoVM. Only certain types are allowed. See the examples module in TornadoVM.
Thank you so much for your indications. Now I have a success code and I was able to convert it.
So the bug was mine, sorry. Thanks for the support.
But like you can see, it's not very optimal, because I want to multiply a double[][] with complex[][], and decompose a Complex in two Matrix2DFloats is painful for performance reasons.
Do you have some suggestion for Complex numbers to avoid this?
Thanks in advance for your work, it's a great project.
Edit (my test code function):
public static void multiply(Matrix2DFloat matrix1, Matrix2DFloat matrix2_real, Matrix2DFloat matrix2_imag, Matrix2DFloat complex_real, Matrix2DFloat complex_imag) {
int N = matrix1.getNumRows();
int M = matrix1.getNumColumns();
int K = matrix2_real.getNumColumns();
for (@Parallel int i = 0; i < N; i++) {
for (@Parallel int j = 0; j < K; j++) {
float sum_real = 0;
float sum_imag = 0;
for (@Parallel int k = 0; k < M; k++) {
float a_real = matrix1.get(i, k);
float b_real = matrix2_real.get(k, j);
float a_imag = 0;
float b_imag = matrix2_imag.get(k, j);
sum_real += a_real * b_real - a_imag * b_imag;
sum_imag += a_real * b_imag + a_imag * b_real;
}
complex_real.set(i, j, sum_real);
complex_imag.set(i, j, sum_imag);
}
}
}
So, as we have now, I understand that there is a performance penalty to marshall the object from your Complex
type to TornadoVM types. I think it makes sense for us to support Complex
values directly, so developers won't have to do this marshalling.
We will discuss this internally with our team and let you know. We are currently designing new types so that we can consider this for future versions.
Does the new version you provided work? Did you encounter new issues?
Does the new version you provided work? Did you encounter new issues?
Yes, last code works with the times that I mentioned. I could offer you my test values if you consider it, the matrixes are not regulars, there is an input with: matrix1 //3x4 double[][] matrix2 //4x16763 Complex[][] expected_complex_result //3x16763 Complex[][]
I am closing this issue. Feel free to open new issues for new feedback or new problems.
Describe the bug
Multiplying a matrix of Complex[][] and double[][] with CPU works fine, but trying with GPU launches a this.code == null exception
How To Reproduce
A jar file has been generated a maven test project with a main class, and included all dependencies with
maven-assembly-plugin
and compiled withmaven-compiler-plugin
at 16 java version.Dependencies are included like documentation says:
command to reproduce:
throws this output:
The code, mainclass has the following directly in main method:
and multiply method with @Parallel:
Expected behavior
Accelerated results with an Y round to 1 or 0, similar to this sample output:
Computing system setup (please complete the following information):
Additional context
MatrixMultiplication2D default tests runs well on GPU, so TornadoVM works fine with OpenCL.
Backends installed: