cuda-memcheck reports errors

karlrupp commented 5 years ago

When running the examples with cuda-memcheck, I encounter various errors. For example, cuda-memcheck ./mdtwObj -t CLASSIFICATION -i GPU 3 512 1 -f data/classification/rm_1/X_MAT data/classification/rm_1/Y_MAT data/classification/rm_1/Z_MAT -k 10 0 -o 1000 152 -m 0 DTW -d 0 -v 0 results in many errors of the form

========= Invalid __global__ read of size 4
=========     at 0x00000e88 in MD_DTW_D
=========     by thread (485,0,0) in block (0,0,0)
=========     Address 0x701d9632c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2c5) [0x203f65]
=========     Host Frame:./mdtwObj [0x20be1]
=========     Host Frame:./mdtwObj [0x3e783]
=========     Host Frame:./mdtwObj [0x74fe]
=========     Host Frame:./mdtwObj [0x514b]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf0) [0x20830]
=========     Host Frame:./mdtwObj [0x6c99]

Those need to be investigated on smaller samples. Compile with nvcc -g -G ... for GPU stack traces.

Part of review at: https://github.com/openjournals/joss-reviews/issues/1049

karlrupp commented 5 years ago

Also the CPU path is not valgrind clean:

$> valgrind ./mdtwObj -t CLASSIFICATION -i CPU 3 1 -f data/classification/rm_1/X_MAT data/classification/rm_1/Y_MAT data/classification/rm_1/Z_MAT -k 10 0 -o 1000 152 -m 0 DTW -v 0
==28320== Memcheck, a memory error detector
==28320== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==28320== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==28320== Command: ./mdtwObj -t CLASSIFICATION -i CPU 3 1 -f data/classification/rm_1/X_MAT data/classification/rm_1/Y_MAT data/classification/rm_1/Z_MAT -k 10 0 -o 1000 152 -m 0 DTW -v 0
==28320== 

The number of iteration is greater than testSize! Verbose mode will be suppressed for this run
Reading data...
Dataset size: [1000,152,3]

Classification w/ DEPENDENT-DTW using CPU

==28320== Invalid read of size 4
==28320==    at 0x409BA6: accumarray (module.cu:1253)
==28320==    by 0x409EA9: crossvalind_Kfold (module.cu:1343)
==28320==    by 0x404348: main (MD_DTW.cu:364)
==28320==  Address 0x60ccf90 is 0 bytes after a block of size 4,000 alloc'd
==28320==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28320==    by 0x409E14: crossvalind_Kfold (module.cu:1332)
==28320==    by 0x404348: main (MD_DTW.cu:364)
==28320== 
...

DavideNardone commented 5 years ago

cuda-memcheck ./mdtwObj -t CLASSIFICATION -i GPU 3 512 1 -f data/classification/rm_1/X_MAT data/classification/rm_1/Y_MAT data/classification/rm_1/Z_MAT -k 10 0 -o 1000 152 -m 0 DTW -d 0 -v 0

When i run the software i got a different error, that was due to the first invalid parameter for the cudaMemset function.

========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaMemset.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x330453]
=========     Host Frame:./mdtwObj [0x3ed3c]
=========     Host Frame:./mdtwObj [0x4470]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:./mdtwObj [0x6f25]

I fixed it by replacing the callbackcudaMemset(&d_test, 0, n_feat * window_size * sizeof(float)) with the following one: cudaMemset(d_test, 0, n_feat * window_size * sizeof(float))

By the way, I suppose the out-of-bounds threads error you get by running the cuda-memcheck function is due to the limitation of the number of threads (< 1024) on your GPU. If so, i will provide a fix asap. Anyway, could please provide me the Maximum number of threads per block of your GPU?

DavideNardone / MTSS-Multivariate-Time-Series-Software

cuda-memcheck reports errors #27