Closed zeroexcuses closed 5 years ago
Could you post a code snippet showing off this failure? I can hack on that.
On Mon, Jan 21, 2019 at 4:52 PM zeroexcuses notifications@github.com wrote:
Can we please have sample code that
1.
allocates some memory 2.
calls A = B * C 3.
calls some kernel on A 4.
calls sgemm D = E * A
? I have some tensor code that runs great in CPU mode, but fails in GPU mode (so the algorithm si correct). All CPU vs GPU unit tests pass -- so it seems I am running into a synchronization issue.
I am using stream.synchronize on after all kernel calls -- so it seems the remaining culprit is that kernels on streamA while cublas is on streamB .. and it's not clear to me how to synchronize the two.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bheisler/RustaCUDA/issues/28, or mute the thread https://github.com/notifications/unsubscribe-auth/AKUNKJItU6Wwar560RdI7YJznNqf4uvxks5vFjaPgaJpZM4aLlJ2 .
I think I got it working via the following changes:
I made Stream's 'inner CUstream' pub:
#[derive(Debug)]
pub struct Stream {
pub inner: CUstream,
}
I initialize the cublas handle by calling
unsafe {
cublas::cublasSetStream_v2(gblas_handle.handle, stream.inner
as *mut cuda_sys::cudart::CUstream_st );
}
This appears to cause the blas to run on the same stream as the kernels.
However, I'm a bit uneasy as I'm brute force casing a sys::cuda::CUstream to a sys::cudart::CUstream
I'm not sure about the difference between the two.
Can we please have sample code that
allocates some memory
calls A = B * C
calls some kernel on A
calls sgemm D = E * A
? I have some tensor code that runs great in CPU mode, but fails in GPU mode (so the algorithm si correct). All CPU vs GPU unit tests pass -- so it seems I am running into a synchronization issue.
I am using stream.synchronize on after all kernel calls -- so it seems the remaining culprit is that kernels on streamA while cublas is on streamB .. and it's not clear to me how to synchronize the two.