MuGdxy / muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Apache License 2.0
144 stars 6 forks source link

Problem & questions #55

Open Da1sypetals opened 1 month ago

Da1sypetals commented 1 month ago
  1. This code ran into compilation errors when I tried to use spmv (while solve works well):
    #include <iostream>
    #include <Eigen/Eigen>
    #include <muda/muda.h>
    #include <muda/ext/linear_system.h>

using namespace muda;

void run_tests() {

int N = 3;

// define a N*N matrix A and b
DeviceTripletMatrix<float, 1> A;
DeviceDenseVector<float> b(N);
DeviceDenseVector<float> x(N);
DeviceDenseVector<float> y(N);

// reserve for triplets
A.resize_triplets(N * N);
A.reshape(N, N);

std::cout << "sizes:\n";
std::cout << A.row_indices().size()
          << "  " << A.col_indices().size()
          << "  " << A.values().size() << std::endl;

ParallelFor(256).apply(N * N, [row_idx = A.row_indices().viewer(),
        col_idx = A.col_indices().viewer(),
        val = A.values().viewer(),
        b = b.viewer(), N]__device__(int i)mutable {

    row_idx(i) = i % N;
    col_idx(i) = i / N;
    val(i) = static_cast<float>(i * i);

    if (i < N) {
        b(i) = static_cast<float>(i);


LinearSystemContext ctx;

DeviceCOOMatrix<float> A_coo;
ctx.convert(A, A_coo);
DeviceCSRMatrix<float> A_csr;
ctx.convert(A_coo, A_csr);

ctx.solve(x.view(), A_csr.cview(), b.cview());

std::cout << "solve done\n";
ctx.spmv(A_csr.cview(), x.cview(), y.view());

Eigen::VectorXf hx(N);
for (int i = 0; i < N; i++) {
    std::cout << hx.coeff(i) << "  ";
std::cout << std::endl;


int main() { run_tests(); return 0; }

terminal output:

/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv.inl(15): error: argument of type "const cusparseDnVecDescr " is incompatible with parameter of type "cusparseDnVecDescr_t" detected during: instantiation of "void muda::LinearSystemContext::generic_spmv(const T &, cusparseOperation_t, cusparseSpMatDescr_t, const cusparseDnVecDescr , const T &, cusparseDnVecDescr_t) [with T=float]" /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(12): here instantiation of "void muda::LinearSystemContext::spmv(const T &, muda::CCSRMatrixView, muda::CDenseVectorView, const T &, muda::DenseVectorView &) [with T=float]" /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(18): here instantiation of "void muda::LinearSystemContext::spmv(muda::CCSRMatrixView, muda::CDenseVectorView, muda::DenseVectorView) [with T=float]" /mnt/a/dev/muda/muda-template/src/ here

/mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv.inl(20): error: argument of type "const cusparseDnVecDescr " is incompatible with parameter of type "cusparseDnVecDescr_t" detected during: instantiation of "void muda::LinearSystemContext::generic_spmv(const T &, cusparseOperation_t, cusparseSpMatDescr_t, const cusparseDnVecDescr , const T &, cusparseDnVecDescr_t) [with T=float]" /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(12): here instantiation of "void muda::LinearSystemContext::spmv(const T &, muda::CCSRMatrixView, muda::CDenseVectorView, const T &, muda::DenseVectorView &) [with T=float]" /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/ext/linear_system/details/routines/spmv/csr_spmv.inl(18): here instantiation of "void muda::LinearSystemContext::spmv(muda::CCSRMatrixView, muda::CDenseVectorView, muda::DenseVectorView) [with T=float]" /mnt/a/dev/muda/muda-template/src/ here

2 errors detected in the compilation of "/mnt/a/dev/muda/muda-template/src/". gmake[2]: [CMakeFiles/hello_muda.dir/build.make:92: CMakeFiles/hello_muda.dir/src/] Error 1 gmake[1]: [CMakeFiles/Makefile2:286: CMakeFiles/hello_muda.dir/all] Error 2 gmake: *** [Makefile:156: all] Error 2

I wonder if there is any problem in mu Muda code or the problem is caused somewhere else.

2. What is typically used (best practice) when it comes to small vector linear algebra _**on device**_  (like float3, float3x3 and dot, outer product, etc.)?
Thanks a lot in advance!
MuGdxy commented 1 month ago
  1. Cuda change it's API, in 11.4, maybe you need to update to >=11.6
  2. I just use Eigen.
Da1sypetals commented 1 month ago

I switched to cuda 12.4 and now runtime error occured when converting triplet sparse matrix to COO format. code:

int N = 3;

DeviceTripletMatrix<float, 1> A;
DeviceDenseVector<float> b(N);
DeviceDenseVector<float> x(N);
DeviceDenseVector<float> y(N);

A.resize_triplets(N * N);
A.reshape(N, N);

std::cout << "sizes:\n";
std::cout << A.row_indices().size()
          << "  " << A.col_indices().size()
          << "  " << A.values().size() << std::endl;

ParallelFor(256).apply(N * N, [row_idx = A.row_indices().viewer(),
        col_idx = A.col_indices().viewer(),
        val = A.values().viewer(),
        b = b.viewer(), N]__device__(int i)mutable {

    row_idx(i) = i % N;
    col_idx(i) = i / N;
    val(i) = static_cast<float>(i * i);

    if (i < N) {
        b(i) = static_cast<float>(i);


std::cout << "Filled A and b\n";

LinearSystemContext ctx;

std::cout << "Context created\n";

DeviceCOOMatrix<float> A_coo;
ctx.convert(A, A_coo); // cuda error triggers here

terminal output:

CUDA error at /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/cub/device/device_merge_sort.h:21 code=222(cudaErrorUnsupportedPtxVersion) "cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_items, num_items, compare_op, _stream, false)"
terminate called after throwing an instance of 'muda::cuda_error<cudaError>'
  what():  CUDA error at /mnt/a/dev/muda/muda-template/submodules/muda/src/muda/cub/device/device_merge_sort.h:21 code=222(cudaErrorUnsupportedPtxVersion)cub::DeviceMergeSort::SortPairs( d_temp_storage, temp_storage_bytes, d_keys, d_items, num_items, compare_op, _stream, false)
[1]    29733 IOT instruction  ./hello_muda

Is specific version of cuda required? Could you please list your configurations or a make a list of version requirements?

MuGdxy commented 1 month ago

I test your code in debug and release mode, in the following platform:

but not get any error.

Da1sypetals commented 1 month ago

I cloned the repo you provided but got the same runtime error :sob: on WSL, using GNU 11.4.0 and cuda 12.4.99, cmake 3.29.3 cmake configure output:

-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.4.99
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.4/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.4/targets/x86_64-linux/include (found version "12.4.99")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (20.1s)
-- Generating done (0.1s)
-- Build files have been written to: /mnt/a/dev/muda/muda-app/build

Also failed with the same runtime error on a remote archlinux machine with GNU 13.2.0 and cuda 12.5.82

all commands I executed:

git clone
cd muda-app
git submodule update --init
git checkout linear_system
mkdir build && cd build
cmake -S .. -B . -DCMAKE_BUILD_TYPE=Debug
cmake --build . --config Debug -j8

cmake configure output on archlinux machine:

-- The CXX compiler identification is GNU 13.2.0
-- The CUDA compiler identification is NVIDIA 12.5.82
-- Found CUDAToolkit: /opt/cuda-12.5/targets/x86_64-linux/include (found version "12.5.82")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (0.9s)
-- Generating done (0.0s)
Da1sypetals commented 1 month ago

Trying the next option: container

Da1sypetals commented 1 month ago

Finally problem was resolved with docker. A starter project with Muda, SFML and Eigen, working with sparse matrix storage and solving runs on the container without runtime errors. Currently the image is built with docker commit and later I will create a Dockerfile for it.