Open ZahariStoyanov opened 7 months ago
So... I added this loop in conjunction with the 3D range and it works:
[...]
for(int i = 0; i < dpt; i++)
if(i == l) resVector[r * wt + c] += aVector[r * dpt + l] * bVector[l * wt + c];
[...]
Why would it need such a bizarre force? What am I missing?
Currently working on a Java library for matrix operations and ML. I use Aparapi to utilize the GPU. I've written this code to multiply two matrices:
However, it looks like resVector[some_index] only gets updated once(r + aVector[0 dpt + 0] bVector[0 * wt + c]). If, instead, I use a 2D range and a loop(the commented bits in the code), it works correctly. What could be the reason for such behavior and how can I force it to work fully parallel?
Interestingly, I tried one thing that "worked" - after updating resVector[some_index], I called this.put(resVector). However, it then couldn't compile in OpenCL and ended up using Java's multi-threading instead, eventually resulting in a correct result.