Closed iomaganaris closed 3 years ago
Following the incompatibilities of Eigen
with OpenACC
I started investigating if Eigen
can be called from CUDA
kernels. To do this I am using a simple example using the Eigen::PartialPivLU
solver created by @cattabiani on top of which I added a CUDA
kernel to run the same solver.
My tries are currently WIP here.
During the development I faced 4 issues:
Eigen::PartialPivLU
solver in the CUDA
kernel. To get the code compiled I needed to do the following changes in the Eigen
source code:
diff --git a/Eigen/src/Core/SolverBase.h b/Eigen/src/Core/SolverBase.h
index 501461042..e7d5ca5a3 100644
--- a/Eigen/src/Core/SolverBase.h
+++ b/Eigen/src/Core/SolverBase.h
@@ -94,7 +94,7 @@ class SolverBase : public EigenBase<Derived>
SolverBase()
{}
EIGEN_DEVICE_FUNC ~SolverBase() {}
using Base::derived;
@@ -102,7 +102,7 @@ class SolverBase : public EigenBase
@@ -593,7 +593,7 @@ struct Assignment<DstXprType, Inverse<PartialPivLU
2. Moving the `Eigen` `MatrixXd` and `VectorXd` structs to the `GPU`. To do this I followed two paths.
- Move the structs as they are based on [this suggestion](https://stackoverflow.com/a/41120980), where I managed to get it compiled but the matrices in the kernel had only zeros
- Turn the structs into C style arrays (double*) to move them and then use `Eigen::Map` to map them to `Eigen` structs. This is the current implementation but still there is some issue with the memory since it's all 0s in the GPU
It seems that moving straight away the `Eigen` structs is nicer but needs more work to understand what is the issue I came across.
3. After doing all the above I get the following error during execution:
bash-4.2$ ./testEigenGPU Size of the matrix? 4 v_device data: 1 3 5 3 v in device 0.000000 0.000000 0.000000 0.000000 Error with cudaDeviceSync: unspecified launch failure Random matrix:
1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1
Random vector:
1 3 5 3
Solution (x) of M*x = v:
0 1 2 3
Device Solution (x) of M*x = v:
0x7fff7a000400
I tried to debug this with `ddt`, `cuda-gdb` and `cuda-memcheck` and I get the following with `cuda-gdb`:
Starting program: /gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/testEigen/build/testEigenGPU [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7fffcffff700 (LWP 244304)] Size of the matrix? 4 v_device data: 1 3 5 3 [New Thread 0x7fffbdb06700 (LWP 244306)] v in device 0.000000 0.000000 0.000000 0.000000
Thread 1 "testEigenGPU" received signal SIGTRAP, Trace/breakpoint trap. [Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0] 0x0000000000db4900 in runPartialPivLuGPU(double, double, double*, int)<<<(1,1,1),(1,1,1)>>> ()
and `cuda-memcheck`:
Error with cudaDeviceSync: unspecified launch failure
By googling the errors I found out that those errors probably come from some segmentation fault coming from the `Eigen::PartialPivLU` solver.
4. A bunch of `warning: calling a __host__ function from a __host__ __device__ function is not allowed` during compilation, which I don't know if they are really used by the solver and are the root of all the errors
`CUDA` used: `10.1.243`
`GCC` used: `9.3.0`
TODO:
~~1. Check why the matrices are not copied correctly into the device~~
2. Try to debug the errors
cc: @pramodk @ohm314
This issue was raised when running
nrnivmodl-core
with the followingModelDB
model https://senselab.med.yale.edu/ModelDB/ShowModel?model=19176&file=%2fHCN2k%2fhcn2.mod#tabs-2 usingPGI 19.4
andPGI 19.10
with theOpenACC
backend generated fromNMODL
. The generatedc++
file forhcn2.mod
file contains a call to the followingEigen solver
generated by the translation of theDERIVATIVE
block:Compiling this file with
pgc++
there is the following issue (with-Minfo=acc
added to the compilation flags):This was due to throwing an exception in https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/util/Memory.h#L70. After fixing this issue by commenting out the problematic line, there was another issue regarding
atomic
coming from https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/products/Parallelizer.h#L14 which was fixed by adding the-DEIGEN_HAS_CXX11_ATOMIC=0
compiler flag topgc++
.Following those, there was an issue coming from the
llvm based
pgc++
compiler, so we tried with thenollvm
backend.The final issue we came across was the following:
For this there is no solution found. To reproduce all the issues in a
gpu node
:CoreNeuron
andNMODL
master
branches were used