Mixed precision for CUDA

kai-lan commented 1 year ago

I am trying to use float for the preconditioner part. As far as my understanding goes, constructing the preconditioner is done on CPU, but why do we need to specify backend type for amg precondtioner? Or am I doing it in a wrong way:

#ifdef USE_VEXCL
  #include <amgcl/backend/vexcl.hpp>
  typedef amgcl::backend::vexcl<double> SBackend;
  #ifdef MIXED_PRECISION
    typedef amgcl::backend::vexcl<float> PBackend;
  #else
    typedef amgcl::backend::vexcl<double> PBackend;
  #endif
#elif USE_CUDA
  #include <amgcl/backend/cuda.hpp>
  typedef amgcl::backend::cuda<double> SBackend;
  #ifdef MIXED_PRECISION
    typedef amgcl::backend::cuda<float> PBackend;
  #else
    typedef amgcl::backend::cuda<double> PBackend;
  #endif
#else
  #include <amgcl/backend/builtin.hpp>
  typedef amgcl::backend::builtin<double> SBackend;
  #ifdef MIXED_PRECISION
    typedef amgcl::backend::builtin<float> PBackend;
  #else
    typedef amgcl::backend::builtin<double> PBackend;
  #endif
#endif

The above code works fine for built-in or vexcl, but has the following error for CUDA build.

/data2/wlan/MLPCG/cxx_src/amgcl/amgcl/backend/interface.hpp(321): error: class "amgcl::backend::spmv_impl<double, amgcl::backend::cuda_matrix<float>, thrust::device_vector<T, thrust::device_allocator<T>>, double, thrust::device_vector<T, thrust::device_allocator<T>>, void>" has no member "apply"
      spmv_impl<Alpha, Matrix, Vector1, Beta, Vector2>::apply(alpha, A, x, beta, y);

ddemidov commented 1 year ago

The CUDA backend does not support mixed precision (it needs things like mixed precision matrix-vector product, and CuSparse did not have those when the backend was implemented, not sure about now).

You can use the VexCL backend for amgcl with CUDA as backend for VexCL.

ddemidov commented 1 year ago

constructing the preconditioner is done on CPU, but why do we need to specify backend type for amg precondtioner?

The preconditioner is constructed on the CPU and then moved into the specified backend for application.

ddemidov / amgcl

Mixed precision for CUDA #265