HumanBrainProject / olfactory-bulb-3d

5 stars 2 forks source link

Different spikes with different number of ranks #6

Open pramodk opened 4 years ago

pramodk commented 4 years ago

I am running neuron with 2 vs 4 ranks and see different number of spikes.

My changes to master branch are:

$ git diff
diff --git a/sim/bulb3dtest.py b/sim/bulb3dtest.py
index abcf87f..68e9eeb 100755
--- a/sim/bulb3dtest.py
+++ b/sim/bulb3dtest.py
@@ -6,7 +6,7 @@ import odors
 from math import sqrt

 params.filename = 'bulb3dtest'
-params.tstop = 1050
+params.tstop = 10
 params.sniff_invl_min = params.sniff_invl_max = 500
 params.training_exc = params.training_inh = True
 from neuron import h
@@ -14,5 +14,6 @@ h('sigslope_AmpaNmda=5')
 h('sigslope_FastInhib=5')
 h('sigexp_AmpaNmda=4')
 params.odor_sequence = [ ('Onion', 50, 1000, 1e+9) ]
-runsim.build_part_model([5,37,32,78,7], [])
+#runsim.build_part_model([5,37,32,78,7], [])
+runsim.build_part_model([5], [])
 runsim.run()
diff --git a/sim/parrun.py b/sim/parrun.py
index 8f14e93..f778126 100755
--- a/sim/parrun.py
+++ b/sim/parrun.py
@@ -32,8 +32,8 @@ def prun(tstop):
   cvode.active(0)
   if rank == 0: print 'cvode active=', cvode.active()
   h.stdinit()
-  pc.nrncore_run("-e %g --voltage 1000." % (tstop, ), 0)
-  h.stdinit()
+  #pc.nrncore_run("-e %g --voltage 1000." % (tstop, ), 0)
+  #h.stdinit()
   inittime = h.startsw() - inittime
   if rank == 0:
     if clean_weights_active:

2 ranks shows:

srun -n 2 dplace x86_64/special -python -mpi bulb3dtest.py
...
spike2file call#1 spikes=25790 sorted using 2 ranks, writing to 1 files in 0.03 sec
psolve did not advance time from t=9.984375 to tnext=10

and 4 ranks shows:

srun -n 4 dplace x86_64/special -python -mpi bulb3dtest.py
...
spike2file call#1 spikes=25782 sorted using 4 ranks, writing to 1 files in 0.03 sec
psolve did not advance time from t=9.984375 to tnext=10
nrnhines commented 4 years ago

Without having yet diagnosed the problem, I thought it would be helpful here to explain how to get prcellstate to help as the first spike difference is, with two ranks, 3.3281250 228221544, and with 4 ranks , 4.9218750 228221544 and this gid refers to one of the two Threshold_Detect instances of a reciprocal synapse where one half on the side of some unknown granule cell gid and the other half is on some unknown mitral cell gid. Darn, that was a long sentence. So we want to determine what (granule or mitral) cell gid this ThreshDetect is actually on. As this is not too large a model, we can start it up on a single process and use the interpreter manually to determine the needed information. Just need to comment out the last line of bulb3dtest.py (runsim.run())

hines@hines-T7500:~/models/modeldb-bulb3d-sim/sim$ nrniv -python -pyexe python2.7 bulb3dtest.py
...
Total # compartments =  94473
OdorStim Onion start=50 dur=1000 conc=1e+09
total setup time  73.3399999142
>>>
>>> from modeldata import getmodel
>>> model = getmodel()
>>> mgrs = model.mgrss[228221544]
>>> mgrs.md_gid  #so the ThreshDetect is on the mitral cell
228221544
>>> mgrs.gd_gid
228221543
>>> mgrs.mgid # so this is what we need for prcellstate
687
>>> mgrs.msecden # just my curiosity. When I was involved with this model, there were no mTufted cells.
mTufted[2].secden[1]
>>> mgrs.xm # just to finish off the irrelevant question of the location of the ThreshDetect instance.
0.8750010218864266

Note, mgrs stands for mitral granule reciprocal synapse and it is implemented in mgrs.py . Now, on to look at pc.prcellstate.

nrnhines commented 4 years ago

Well, this is very strange. The sets of ThreshDetect gids on that mTufted cell are very different between nhost 2 and 4, at least in ordering and perhaps also in almost disjointness. e.g 239307627 exists with nhost =2 but not with nhost=4 (there are 1164 ThreshDetect on that cell for both nhost 2 and 4). Taking the effort to sort according to ThreshDetect gid I see that most are in fact disjoint with only a hundred or so shared.

nrnhines commented 4 years ago

The following series of comments are notes about progress in diagnosing the spike raster difference with nhost. I'm not going to bother with the false trails I went down or the useless determination of information that did not move the diagnosis forward. Looking at the global set of gids which are simulated, the granule sets depend on nhost (though they always have the same size (17057), only 8258 granule cell gids are shared by nhost 2 and 4 simulations. I'm going to look at the determination of the granule cells that connect to mTufted cell (gid 687).

nrnhines commented 4 years ago

There seems to be a concept of a gcset for a glomerulus. A mitral can connect to a granule only if the granule is not already in the gcset. But it seems that each rank has its own gcset which may not be managed so that all ranks have the same gcset. Thus the expression gpos not in gconnected is (rank, nhost) dependent.

nrnhines commented 4 years ago

The difference in granule connections goes away if a per mitral cell gr_connected = {} # mitral_gid:set_of_gr_connections is used in connect_to_granule. (although that breaks the MGRS constructor later on with assertion errors like

md_gid=239307374 and/or gd_gid=239307373 already registered

(Which is kind of surprising as the granules to mitrals should be unique. But who wants to debug downstream a temporary change that was only intended to demonstrate that the existing gcset approach was problematic in terms of a set of granule gids independent of nhost)

nrnhines commented 4 years ago

A more careful look at

def mgrs_gid(gid_source, gid_target, slot=0):

reveals that it has been changed from my original implementation in that instead of returning a unique value for the mitral gid and the granule gid, it compresses the mitral_gid by dividing by params.Ngloms. This is the origin of the already registered error I mentioned in the previous comment and also gives more weight to my notion that in this model a granule to mitral reciprocal synapse requires that the granule gid connects to at most one mitral cell in a given glomerulus.