Test BlueBrain model miniapp i.e. channel-benchmark on GPU with NMODL

pramodk commented 5 years ago

We have tested ring model with NMODL's OpenACC backend but not BBP model. The build script on BB5 to use NMODL:

CUR_DIR=`pwd`

module load cmake gcc/6.4.0 flex/2.6.3 bison/3.0.5 python-dev/0.1/python3
git clone --recursive https://github.com/BlueBrain/nmodl.git -b pr/wip-acc-fixes
cd nmodl
mkdir -p build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=`pwd`/install
make -j12
make -j12 install

cd $CUR_DIR
git clone --recursive https://github.com/BlueBrain/CoreNeuron.git -b openacc-nmodl coreneuron
cd coreneuron
mkdir -p build && cd build

module purge
module load pgi/19.4 cuda/9.2.88 hpe-mpi cmake python-dev/0.1/python3
export CC=mpicc
export CXX=mpicxx

cmake .. \
    -DCMAKE_C_FLAGS="-O2 -ta=tesla:cuda9.2 -DR123_USE_SSE=0  -DEIGEN_DONT_VECTORIZE=1 -D_GLIBCXX_USE_CXX11_ABI=0" \
    -DCMAKE_CXX_FLAGS="-O2 -ta=tesla:cuda9.2 -DR123_USE_SSE=0  -DEIGEN_DONT_VECTORIZE=1 -D_GLIBCXX_USE_CXX11_ABI=0" \
    -DCOMPILE_LIBRARY_TYPE=STATIC  \
    -DCUDA_HOST_COMPILER=`which gcc` \
    -DCUDA_PROPAGATE_HOST_FLAGS=OFF \
    -DENABLE_SELECTIVE_GPU_PROFILING=ON \
    -DENABLE_OPENACC=ON \
    -DAUTO_TEST_WITH_SLURM=OFF \
    -DAUTO_TEST_WITH_MPIEXEC=OFF \
    -DFUNCTIONAL_TESTS=OFF \
    -DUNIT_TESTS=OFF \
    -DENABLE_NMODL=ON \
    -DNMODL_ROOT=$CUR_DIR/nmodl/build/install \
    -DNMODL_EXTRA_FLAGS="passes --verbatim-rename --inline sympy --analytic acc --oacc"
make VERBOSE=1 -j12

Todos:

add ADDITIONAL_MECHS and ADDITIAONL_MECHPATH to cmake of CoreNEURON
check which mod files have issues and report those here with the changes required
if OpenACC compilation of cpp file fails, make necessary changes to .cpp directly and report changes required

st4rl3ss commented 5 years ago

Coreneuron is at least able to compile most of the files. The five files that fail compilation are the following: GluSynapse.mod StochKv.mod ALU.mod StochKv3.mod SynapseReader.mod I will work on fixing the issues with these mod files.

st4rl3ss commented 5 years ago

Mod files were fixed where possible and the resulting library was pushed on the sandbox/bellotta/nmodl_openacc branch of sim/neurodamus/bbp. Coreneuron+nmodl with GPU support now compiles correctly using the procedure above.

st4rl3ss commented 5 years ago

Update on current status: Ring test runs correctly with both NMODL and MOD2C, using Coreneuron on the openacc-nmodl unmodified branch. More complex configuration like scx-1k-v5 don't run correctly, tested with the --gpu --cell-permute 2 parameters and modified modfiles from the .sandbox/bellotta/nmodl_openacc branch. The application was compiled with both MPI and OPENMP deactivated.

In particular, when running more complex configurations, coreneuron starts but exits almost immediately without any error:

apps git:(openacc-nmodl) ✗ ./coreneuron_exec -d /gpfs/bbp.cscs.ch/home/bellotta/proj/blueconfigs/scx-1k-v5/output/coreneuron_input -gpu --cell-permute 2             

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2015
 version id unimplemented

 Additional mechanisms from files
 BinReportHelper.mod BinReports.mod Ca.mod CaDynamics_DC0.mod CaDynamics_E2.mod Ca_HVA.mod Ca_HVA2.mod Ca_LVAst.mod CoreConfig.mod Ih.mod Im.mod K_Pst.mod K_Tst.mod KdShu2007.mod MemUsage.mod NaTa_t.mod NaTg.mod NaTs2_t.mod Nap_Et2.mod ProbAMPANMDA_EMS.mod ProbGABAAB_EMS.mod ProfileHelper.mod SK_E2.mod SKv3_1.mod StochKv.mod StochKv3.mod SynapseReader.mod TTXDynamicsSwitch.mod VecStim.mod halfgap.mod netstim_inhpoisson.mod

 Memory (MBs) :             After mk_mech : Max 3.9062, Min 3.9062, Avg 3.9062 
 Memory (MBs) :            After MPI_Init : Max 8.3320, Min 8.3320, Avg 8.3320 
 Memory (MBs) :          Before nrn_setup : Max 145.9766, Min 145.9766, Avg 145.9766 
➜  apps git:(openacc-nmodl) ✗

Investigating with cuda-gdb, the execution just stalls at some point. Interrupting forcefully the execution leads to the following backtrace:

Thread 1 "coreneuron_exec" received signal SIGINT, Interrupt.
0x00007fffebaadc6d in sendmsg () from /usr/lib64/libpthread.so.0
(cuda-gdb) bt
#0  0x00007fffebaadc6d in sendmsg () from /usr/lib64/libpthread.so.0
#1  0x00007fffe9c21248 in cudbgApiDetach () from /usr/lib64/libcuda.so
#2  0x00007fffe9c21642 in cudbgApiDetach () from /usr/lib64/libcuda.so
#3  0x00007fffe9c19dea in cudbgReportDriverInternalError () from /usr/lib64/libcuda.so
#4  0x00007fffe9c1a9a5 in cudbgReportDriverInternalError () from /usr/lib64/libcuda.so
#5  0x00007fffe9c1db77 in cudbgReportDriverInternalError () from /usr/lib64/libcuda.so
#6  0x00007fffe9c1dc99 in cudbgReportDriverInternalError () from /usr/lib64/libcuda.so
#7  0x00007fffe9cc0504 in cuEGLApiInit () from /usr/lib64/libcuda.so
#8  0x00007fffe9cc083e in cuEGLApiInit () from /usr/lib64/libcuda.so
#9  0x00007fffe9bdf542 in cuMemGetAttribute_v2 () from /usr/lib64/libcuda.so
#10 0x00007fffe9d0a5cf in cuModuleLoadData () from /usr/lib64/libcuda.so
#11 0x00007fffecd6d90a in __pgi_uacc_cuda_load_this_module (dindex=1, error=0, 
    pgi_cuda_loc=0xa59c00 <__PGI_CUDA_LOC>) at ../src/cuda_init.c:1561
#12 0x00007fffecd6daeb in __pgi_uacc_cuda_load_module (dindex=1, error=0) at ../src/cuda_init.c:1635
#13 0x00007fffed09348b in __pgi_uacc_init_device (dindex=1) at ../src/init.c:720
#14 0x00007fffed09a4b7 in __pgi_uacc_upstart (
    filename=0x526b20 <.F0002.9852> "/gpfs/bbp.cscs.ch/home/bellotta/coreneuron_gpu_mod_test/coreneuron/build/coreneuron/passive.cpp", 
    funcname=0x526c00 <.F0017.10168> "_ZN117_INTERNAL_95__gpfs_bbp_cscs_ch_home_bellotta_coreneuron_gpu_mod_test_coreneuron_build_coreneuron_passive_cpp_dc54600810coreneuron22setup_global_variablesEv", lineno=138, 
    funcstartlineno=131, funcendlineno=139, async=-1, pdevid=0x7fffffffbea4, psavedevid=0x7fffffffbea0)
    at ../src/upstart.c:78
#15 0x00000000004ab64e in coreneuron::setup_global_variables ()
    at /gpfs/bbp.cscs.ch/home/bellotta/coreneuron_gpu_mod_test/coreneuron/build/coreneuron/passive.cpp:138
#16 coreneuron::_passive_reg ()
    at /gpfs/bbp.cscs.ch/home/bellotta/coreneuron_gpu_mod_test/coreneuron/build/coreneuron/passive.cpp:255
#17 0x0000000000450624 in coreneuron::mk_mech (s=...)
    at /gpfs/bbp.cscs.ch/home/bellotta/coreneuron_gpu_mod_test/coreneuron/coreneuron/nrniv/mk_mech.cpp:181
#18 0x000000000044fd04 in coreneuron::mk_mech (

The function on which the program hangs seems to be consistently the same every time one stops the application.

Further investigation could be possible using the Allinea tools but at the moment we are still waiting for replies from the Allinea support concerning the issue the ddt application has in running program with the CUDA debugger active.

pramodk commented 3 years ago

We can reduce the scope of this ticket: instead of jumping to production model, we can first validate the channel-benchmark. Note that the channel-benchmark has two circuits:

[ ] cortex
[ ] hippocampus

The mod files from these two models should be already compatible with NMODL. As part of this ticket we can run the models and see if they produce same results as CPU version (or NEURON).

olupton commented 3 years ago

This is basically done. For example, https://github.com/neuronsimulator/nrn/pull/1439 enables these tests and compares them between NEURON and CoreNEURON on CPU/GPU. The results are, as far as I know, consistent on GPU/CPU with MOD2C/NMODL and with Intel/NVHPC compilers.

The issue is that we need a solution to https://github.com/neuronsimulator/nrn/issues/1346 so we can actually integrate these tests in the CI.

olupton commented 3 years ago

https://github.com/BlueBrain/spack/pull/1303 and https://github.com/neuronsimulator/nrn/pull/1439 together have added this to the CoreNEURON CI.

BlueBrain / nmodl

Test BlueBrain model miniapp i.e. channel-benchmark on GPU with NMODL #206