Closed UNDEFINED-BEHAVIOR closed 4 years ago
It is possible, but you need to explicitly pass the command queue list and partitioning to the eval function since vexcl can not determine those from raw pointers. Also, your example does not make much sense, because the function will be evaluated by each of the GPU threads, and you don't want them to perform identical work. Here is a more realistic example of evaluating a dense matrix-vector product, where each GPU thread computes single element of the result vector:
#include <iostream>
#include <vexcl/vexcl.hpp>
int main() {
const int rows = 16, cols = 4;
vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
vex::vector<float> A(ctx, rows * cols);
vex::vector<float> x(ctx, cols);
vex::vector<float> y(ctx, rows);
x = 1;
A = 1;
VEX_FUNCTION(void, mv, (int, row)(int, cols)(float*, A)(float*, x)(float*, y),
float row_sum = 0;
A += row * cols;
for(int i = 0; i < cols; ++i)
row_sum += A[i] * x[i];
y[row] = row_sum;
);
vex::eval(
mv(vex::element_index(), cols, raw_pointer(A), raw_pointer(x), raw_pointer(y)),
y.queue_list(), y.partition()
);
std::cout << y << std::endl;
}
Thankyou! I'll try it out.
Is there a way to represent manually written kernel loop in vexcl? Element wise version work as expected but wondering if that's possible.