Open espenhgn opened 3 years ago
Back-conversion of morphologies to .swc format should be doable using https://github.com/JustasB/hoc2swc
Added file Example_Arbor_swc_hybridLFPy.ipynb
(https://github.com/INM-6/hybridLFPy/blob/fix-67/examples/Example_Arbor_swc_hybridLFPy.ipynb) addressing pt 2 above.
Single-node scaling test on jureca-dc comparing Arbor and the usual hybridLFPy implementation: The “run” times correspond to total times spent per task calling arbor.simulation.run() and the equivalent LFPy.Cell.simulate() over and over. The “pop” times include additional overhead like setting up the cells, determining synapse locations, activation times, compute LFPs etc. The corresponding files are https://github.com/INM-6/hybridLFPy/tree/fix-67/examples/benchmarks where the file example_brunel_arbor.py is the Arbor+lfpykit-implementation of example_brunel.py using the “old” hybridLFPy. Output is comparable; the numbers of compartments/CVs are comparable between the NEURON/LFPy and Arbor simulations as max segment/CV length is the same value.
Multinode performance testing w. increased network size and longer simulation duration (2000 ms), two CPUs (virtual) per MPI task, multithread execution with Arbor via
context = arbor.context(2, None)
domains = arbor.partition_load_balance(recipe, context)
Arbor (https://github.com/espenhgn/arbor/tree/isyn) config:
{'mpi': False,
'mpi4py': False,
'gpu': False,
'vectorize': True,
'version': '0.5.3-dev',
'source': '2021-07-31T15:18:36+02:00 9503801c293b6933e2b186d8fbf7a1245366e876',
'arch': 'znver1'}
Hello @espenhgn.
Am I correct that the following steps are performed for simulating a model with NCELLS
cells:
Step 1 initialise MPI
Step 2
Create and run a Nest Brunel model with NCELLS
cells to generate input spike information for detailed cell models
Step 3
Given the number of cells per MPI rank is NLOCAL_CELLS = NCELLS / NRANKS
,
for each cable cell assigned to this rank perform a single-cell-model simulation with Arbor.
Then I have the following observations:
arbor.simulation
object only contains a model with 1 cell, so multithreading isn't required. The CPU back end of Arbor parallelises cells over threads, so a one cell model will only run on one thread
arbor.context(1, None)
. And I have the following questions:
srun/sbatch
parameters you use? NTASKS
in the plots correspond to the number of MPI ranks?
NTASKS=2^9
would run on 4 nodes? Hi @bcumming
Hello @espenhgn.
Am I correct that the following steps are performed for simulating a model with
NCELLS
cells:Step 1 initialise MPI
Yes, indeed.
Step 2 Create and run a Nest Brunel model with
NCELLS
cells to generate input spike information for detailed cell models
Yes, that's the general idea behind hybridLFPy that simulations of network activity (spikes) is performed before the simulations of the detailed cell simulations needed for extracellular signals. For the sake of simplicity the arbor_reproducer.py
script creates random synaptic activation times for the arbor simulation part using scipy.stats which are subsequently fed to the class Recipe
instance around here: https://github.com/INM-6/hybridLFPy/blob/ba6956be902d1939f42966e43801494aeca1fbe8/examples/benchmarks/arbor_reproducer.py#L317 (leaving NEST out of the equation altogether).
Step 3
Given the number of cells per MPI rank is
NLOCAL_CELLS = NCELLS / NRANKS
, for each cable cell assigned to this rank perform a single-cell-model simulation with Arbor.
Yes, each rank should run each subset of cells distributed in a round-robin manner. Each cell simulation on each rank is performed in succession in the case that NLOCAL_CELLS > NRANKS
. The ArborPopulation.run()
method (https://github.com/INM-6/hybridLFPy/blob/ba6956be902d1939f42966e43801494aeca1fbe8/examples/benchmarks/arbor_reproducer.py#L343) just iterates over the ArborPopulation.cellsim(cellindex)
method (https://github.com/INM-6/hybridLFPy/blob/ba6956be902d1939f42966e43801494aeca1fbe8/examples/benchmarks/arbor_reproducer.py#L199).
Then I have the following observations:
- the
arbor.simulation
object only contains a model with 1 cell, so multithreading isn't required. The CPU back end of Arbor parallelises cells over threads, so a one cell model will only run on one thread
Ok, thanks for the confirmation. I was curious whether or not Arbor was somehow distributing single-cell simulations analogous to the multisplit method of Neuron.
- The most efficient way to run would be one MPI rank per physical core, with
arbor.context(1, None)
.
- You are running on Juwels-DC, which has 2 sockets of 64 core AMD EPYC CPUs
I'm running on Jureca-DC as well as JUSUF, both in Jülich. These are both using two 64 core AMD EPYC CPUs per node.
- so you are running 128 MPI ranks per node, where each node has 2 "hyperthreaded" cores.
Yes, for the benchmark curves above at most 128 MPI ranks are created per node.
And I have the following questions:
- what is the
srun/sbatch
parameters you use?
These are typically (see run_arbor_reproducer.py
https://github.com/INM-6/hybridLFPy/blob/fix-67/examples/benchmarks/run_arbor_reproducer.py):
#!/bin/bash
##################################################################
#SBATCH --account $ACCOUNT
#SBATCH --job-name <MD5>
#SBATCH --time 0:04:01
#SBATCH -o logs/<MD5>.txt
#SBATCH -e logs/<MD5>_error.txt
#SBATCH --ntasks NTASKS
#SBATCH --cpus-per-task NTHREADS
##################################################################
# from here on we can run whatever command we want
# unset DISPLAY # DISPLAY somehow problematic with Slurm
srun --mpi=pmi2 python -u arbor_reproducer.py <MD5>
The NTHREADS
value was parsed to arbor.context()
in the simulation script as well (but as you confirmed this should then remain at its default value "1").
The <MD5>
value is just a hash value I use to keep track of unique parameter combinations for each individual simulation in this case.
- does
NTASKS
in the plots correspond to the number of MPI ranks?
Yes.
- If my assumptions above are correct,
NTASKS=2^9
would run on 4 nodes?
Yes, for the curves shown above in this issue.
Is your feature request related to a problem? Please describe. Replacing LFPy by Arbor+LFPykit for simulations of passive multicompartment neuron models with current-based synapses in
Population.cellsim()
ought to be a relatively low hanging fruit, that could very well speed up simulations substantially over the equivalent simulations now relying on LFPy+NEURON (in particular with GPU access).Steps:
Notes/possible issues that may occur: