Open Da1sypetals opened 1 month ago
I switched to cuda 12.4 and now runtime error occured when converting triplet sparse matrix to COO format. code:
int N = 3;
DeviceTripletMatrix<float, 1> A;
DeviceDenseVector<float> b(N);
DeviceDenseVector<float> x(N);
DeviceDenseVector<float> y(N);
A.resize_triplets(N * N);
A.reshape(N, N);
std::cout << "sizes:\n";
std::cout << A.row_indices().size()
<< " " << A.col_indices().size()
<< " " << A.values().size() << std::endl;
ParallelFor(256).apply(N * N, [row_idx = A.row_indices().viewer(),
col_idx = A.col_indices().viewer(),
val = A.values().viewer(),
b = b.viewer(), N]__device__(int i)mutable {
row_idx(i) = i % N;
col_idx(i) = i / N;
val(i) = static_cast<float>(i * i);
if (i < N) {
b(i) = static_cast<float>(i);
}
});
std::cout << "Filled A and b\n";
LinearSystemContext ctx;
std::cout << "Context created\n";
DeviceCOOMatrix<float> A_coo;
ctx.convert(A, A_coo); // cuda error triggers here
terminal output:
CUDA error at /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/cub/device/device_merge_sort.h:21 code=222(cudaErrorUnsupportedPtxVersion) "cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_items, num_items, compare_op, _stream, false)"
terminate called after throwing an instance of 'muda::cuda_error<cudaError>'
what(): CUDA error at /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/cub/device/device_merge_sort.h:21 code=222(cudaErrorUnsupportedPtxVersion)cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_items, num_items, compare_op, _stream, false)
[1] 29733 IOT instruction ./hello_muda
Is specific version of cuda required? Could you please list your configurations or a make a list of version requirements?
https://github.com/MuGdxy/muda-app/tree/linear_system
I test your code in debug and release mode, in the following platform:
Windows
Linux
but not get any error.
I cloned the repo you provided but got the same runtime error :sob: on WSL, using GNU 11.4.0 and cuda 12.4.99, cmake 3.29.3 cmake configure output:
-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.4.99
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.4/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.4/targets/x86_64-linux/include (found version "12.4.99")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (20.1s)
-- Generating done (0.1s)
-- Build files have been written to: /mnt/a/dev/muda/muda-app/build
Also failed with the same runtime error on a remote archlinux machine with GNU 13.2.0 and cuda 12.5.82
all commands I executed:
git clone git@github.com:MuGdxy/muda-app.git
cd muda-app
git submodule update --init
git checkout linear_system
mkdir build && cd build
cmake -S .. -B . -DCMAKE_BUILD_TYPE=Debug
cmake --build . --config Debug -j8
cmake configure output on archlinux machine:
-- The CXX compiler identification is GNU 13.2.0
-- The CUDA compiler identification is NVIDIA 12.5.82
...
-- Found CUDAToolkit: /opt/cuda-12.5/targets/x86_64-linux/include (found version "12.5.82")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (0.9s)
-- Generating done (0.0s)
Trying the next option: container
Finally problem was resolved with docker. A starter project with Muda, SFML and Eigen, working with sparse matrix storage and solving runs on the container without runtime errors. Currently the image is built with docker commit
and later I will create a Dockerfile
for it.
spmv
(whilesolve
works well):using namespace muda;
void run_tests() {
}
int main() { run_tests(); return 0; }
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv.inl(15): error: argument of type "const cusparseDnVecDescr " is incompatible with parameter of type "cusparseDnVecDescr_t" detected during: instantiation of "void muda::LinearSystemContext::generic_spmv(const T &, cusparseOperation_t, cusparseSpMatDescr_t, const cusparseDnVecDescr , const T &, cusparseDnVecDescr_t) [with T=float]" /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(12): here instantiation of "void muda::LinearSystemContext::spmv(const T &, muda::CCSRMatrixView, muda::CDenseVectorView, const T &, muda::DenseVectorView &) [with T=float]"
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(18): here
instantiation of "void muda::LinearSystemContext::spmv(muda::CCSRMatrixView, muda::CDenseVectorView, muda::DenseVectorView) [with T=float]"
/mnt/a/dev/muda/muda-template/src/main.cu(101): here
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv.inl(20): error: argument of type "const cusparseDnVecDescr " is incompatible with parameter of type "cusparseDnVecDescr_t" detected during: instantiation of "void muda::LinearSystemContext::generic_spmv(const T &, cusparseOperation_t, cusparseSpMatDescr_t, const cusparseDnVecDescr , const T &, cusparseDnVecDescr_t) [with T=float]" /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(12): here instantiation of "void muda::LinearSystemContext::spmv(const T &, muda::CCSRMatrixView, muda::CDenseVectorView, const T &, muda::DenseVectorView &) [with T=float]"
/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(18): here
instantiation of "void muda::LinearSystemContext::spmv(muda::CCSRMatrixView, muda::CDenseVectorView, muda::DenseVectorView) [with T=float]"
/mnt/a/dev/muda/muda-template/src/main.cu(101): here
2 errors detected in the compilation of "/mnt/a/dev/muda/muda-template/src/main.cu". gmake[2]: [CMakeFiles/hello_muda.dir/build.make:92: CMakeFiles/hello_muda.dir/src/main.cu.o] Error 1 gmake[1]: [CMakeFiles/Makefile2:286: CMakeFiles/hello_muda.dir/all] Error 2 gmake: *** [Makefile:156: all] Error 2