Mixing cublas and rustacuda ?

zeroexcuses commented 5 years ago

Can we please have sample code that

allocates some memory
calls A = B * C
calls some kernel on A
calls sgemm D = E * A

? I have some tensor code that runs great in CPU mode, but fails in GPU mode (so the algorithm si correct). All CPU vs GPU unit tests pass -- so it seems I am running into a synchronization issue.

I am using stream.synchronize on after all kernel calls -- so it seems the remaining culprit is that kernels on streamA while cublas is on streamB .. and it's not clear to me how to synchronize the two.

rusch95 commented 5 years ago

Could you post a code snippet showing off this failure? I can hack on that.

On Mon, Jan 21, 2019 at 4:52 PM zeroexcuses notifications@github.com wrote:

Can we please have sample code that

1.

allocates some memory 2.

calls A = B * C 3.

calls some kernel on A 4.

calls sgemm D = E * A

? I have some tensor code that runs great in CPU mode, but fails in GPU mode (so the algorithm si correct). All CPU vs GPU unit tests pass -- so it seems I am running into a synchronization issue.

I am using stream.synchronize on after all kernel calls -- so it seems the remaining culprit is that kernels on streamA while cublas is on streamB .. and it's not clear to me how to synchronize the two.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bheisler/RustaCUDA/issues/28, or mute the thread https://github.com/notifications/unsubscribe-auth/AKUNKJItU6Wwar560RdI7YJznNqf4uvxks5vFjaPgaJpZM4aLlJ2 .

zeroexcuses commented 5 years ago

I think I got it working via the following changes:

I made Stream's 'inner CUstream' pub:

#[derive(Debug)]
pub struct Stream {
pub inner: CUstream,
}

I initialize the cublas handle by calling

    unsafe {
        cublas::cublasSetStream_v2(gblas_handle.handle, stream.inner
            as *mut cuda_sys::cudart::CUstream_st );
    }

This appears to cause the blas to run on the same stream as the kernels.

However, I'm a bit uneasy as I'm brute force casing a sys::cuda::CUstream to a sys::cudart::CUstream

I'm not sure about the difference between the two.

bheisler / RustaCUDA

Mixing cublas and rustacuda ? #28