Open robertdfrench opened 10 years ago
As a note, you don't technically need to do matrix multiplication to implement LU Decomposition or Conjugate Gradient -- BUT taking the time to implement it will get you familiar with some of the shared memory patterns you'll need to take advantage of.
Here's some good news straight from the mouth of NVIDIA
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#shared-memory