Closed hweom closed 2 years ago
Looks like the copy()
function for CUDA doesn't check that the destination is large enough (here).
By adding a check like this:
macro_rules! iblas_copy_for_cuda {
($t:ident) => {
fn copy(
&self,
x: &SharedTensor<$t>,
y: &mut SharedTensor<$t>,
) -> Result<(), ::coaster::error::Error> {
assert_eq!(x.desc().size(), y.desc().size());
We now get a panic:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `300`,
right: `10`', coaster-blas/src/frameworks/cuda/mod.rs:23:5
stack backtrace:
0: rust_begin_unwind
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:181:5
4: coaster_blas::frameworks::cuda::<impl coaster_blas::plugin::Copy<f32> for coaster::backend::Backend<coaster::frameworks::cuda::Cuda>>::copy
at ./coaster-blas/src/frameworks/cuda/helper.rs:109:13
5: <juice::layers::common::linear::Linear as juice::layer::ComputeParametersGradient<f32,B>>::compute_parameters_gradient
at ./juice/src/layers/common/linear.rs:220:9
So we're trying to copy 300 floats into a tensor with size 10. And it happens in the Linear layer here
Describe the bug
cuda-memcheck
reports scrolling errors onexample-mnist-classification
like this:To Reproduce
Steps to reproduce the behaviour:
cargo build
cuda-memcheck target/debug/example-mnist-classification mnist linear
Expected behavior
No errors.
Please complete the following information:
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1936 G /usr/lib/Xorg 4MiB | +-----------------------------------------------------------------------------+