bheisler / RustaCUDA

Rusty wrapper for the CUDA Driver API
Apache License 2.0
765 stars 58 forks source link

example passing float as an argument #21

Closed zeroexcuses closed 5 years ago

zeroexcuses commented 5 years ago

I have a kernel:

extern "C"
__global__ void add_10(
    int n,
    float *dst, int dst_inc,
    float *a, int a_inc,
    float f ) {
  int i_start = blockIdx.x * blockDim.x + threadIdx.x;
  int i_inc = blockDim.x * gridDim.x;
  int i = i_start;

  while (i < n) {
    dst[i ] = a[i ]*2 + 100.0 + f;
    i += i_inc;
  }
}

I am trying to call it via:

    let result =
        unsafe { launch!(module.add_10<<<1, 1024, 0, stream>>>(
          4, c.device_ptr(), 1, a.device_ptr(), 1, 20.0 )) };

    let t = c.get();
    println!("status: {:?}, output: {:?}", result, t);

(I'm using my own "DeviceVector", not the builtin DeviceBuffer).

It seems that regardles of what I try to pass for argument "f", the result is always 0. It is as if any float I try to pass gets the value 0 in the kernel.

Is there any sample code on how to pass a floating point value to the kernel?

zeroexcuses commented 5 years ago

Hmm, I just figured out my own problem, I need to write 20.0_f32 instead of 20.0.

I do not know why 20.0 did not work ... but it would be nice if launch! gave an error instead of silently passing it as 0.

bheisler commented 5 years ago

Hey, thanks for the suggestion!

Unfortunately, I don't think that's possible. There is no way for RustaCUDA to know what types (or even how many) arguments the kernel expects. The CUDA driver should know how many arguments there are and how many bytes each argument should take, but there's no way for RustaCUDA to access that information. Even then, the driver only sees a pointer, it just has to assume that it can read the right number of bytes from that pointer.

In your case, I'm guessing that your kernel expected a 32-bit float but you passed it a 64-bit float. When the driver copied over the first four bytes of the value, it got corrupted to 0.0 - see this playground example which does the same thing.

Ultimately, this is one of many reasons why you have to wrap kernel launches in an unsafe block.

zeroexcuses commented 5 years ago

You're right. Short of parsing the ptx file, there's no way launch! macro can know if the argument is supposed to be f32 or f64; so when Rust gives it a f64, there's no way it can issue a warning.

Closing issue as resolved as this is 100% user (my) error.

rusch95 commented 5 years ago

Theoretically, we could do a bit of parsing on the ptx file to provide a bit of safety there.

On Sun, Dec 16, 2018, 4:02 PM zeroexcuses <notifications@github.com wrote:

You're right. Short of parsing the ptx file, there's no way launch! macro can know if the argument is supposed to be f32 or f64; so when Rust gives it a f64, there's no way it can issue a warning.

Closing issue as resolved as this is 100% user (my) error.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bheisler/RustaCUDA/issues/21#issuecomment-447675529, or mute the thread https://github.com/notifications/unsubscribe-auth/AKUNKKoT-L_j76qYJ-xvQi_ej1zbB56Qks5u5rTbgaJpZM4ZVaxD .

rusch95 commented 5 years ago

It might not be too difficult, so I'll take a peak at it.

On Sun, Dec 16, 2018, 5:19 PM Robert Rusch <rusch95@gmail.com wrote:

Theoretically, we could do a bit of parsing on the ptx file to provide a bit of safety there.

On Sun, Dec 16, 2018, 4:02 PM zeroexcuses <notifications@github.com wrote:

You're right. Short of parsing the ptx file, there's no way launch! macro can know if the argument is supposed to be f32 or f64; so when Rust gives it a f64, there's no way it can issue a warning.

Closing issue as resolved as this is 100% user (my) error.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bheisler/RustaCUDA/issues/21#issuecomment-447675529, or mute the thread https://github.com/notifications/unsubscribe-auth/AKUNKKoT-L_j76qYJ-xvQi_ej1zbB56Qks5u5rTbgaJpZM4ZVaxD .

bheisler commented 5 years ago

I suppose parsing PTX files would be possible, but I don't think it really belongs in this crate. That's a lot of runtime overhead to give up for a small improvement in safety, which is not a great fit for a low-level crate like this. Could be done as another crate built on top of RustaCUDA maybe.

rusch95 commented 5 years ago

Doing such another thing in another crate would be reasonable, but this shouldn't have run time overhead, as you can do this analysis at build time.

rusch95 commented 5 years ago

My thought is to provide a launcher macro that has reference to the original .cu file for the .ptx module you are launching, so that the launcher can then enforce types at compile time.