ddemidov / vexcl

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
http://vexcl.readthedocs.org
MIT License
699 stars 81 forks source link

Wrapping explicit for loop? #279

Closed UNDEFINED-BEHAVIOR closed 4 years ago

UNDEFINED-BEHAVIOR commented 4 years ago

Is there a way to represent manually written kernel loop in vexcl? Element wise version work as expected but wondering if that's possible.

  VEX_FUNCTION(
    void,
    myfn,
    (cl_float, a)
    (cl_float*, b)
    (int, z),

    for(cl_int i =0; i < z; i++) {
      b[i] = a[i] * 2;
    }

  );

  vex::eval(
    myfn(
      vv_a,
      vex::raw_pointer(vv_b),
      sourceData.size()
    )
  );
ddemidov commented 4 years ago

It is possible, but you need to explicitly pass the command queue list and partitioning to the eval function since vexcl can not determine those from raw pointers. Also, your example does not make much sense, because the function will be evaluated by each of the GPU threads, and you don't want them to perform identical work. Here is a more realistic example of evaluating a dense matrix-vector product, where each GPU thread computes single element of the result vector:

#include <iostream>
#include <vexcl/vexcl.hpp>

int main() {
    const int rows = 16, cols = 4;

    vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));

    vex::vector<float> A(ctx, rows * cols);
    vex::vector<float> x(ctx, cols);
    vex::vector<float> y(ctx, rows);

    x = 1;
    A = 1;

    VEX_FUNCTION(void, mv, (int, row)(int, cols)(float*, A)(float*, x)(float*, y),
        float row_sum = 0;
        A += row * cols;
        for(int i = 0; i < cols; ++i)
            row_sum += A[i] * x[i];
        y[row] = row_sum;
    );

    vex::eval(
            mv(vex::element_index(), cols, raw_pointer(A), raw_pointer(x), raw_pointer(y)),
            y.queue_list(), y.partition()
            );

    std::cout << y << std::endl;
}
UNDEFINED-BEHAVIOR commented 4 years ago

Thankyou! I'll try it out.