brian-team / brian2

Brian is a free, open source simulator for spiking neural networks.
http://briansimulator.org
Other
932 stars 220 forks source link

Times that are multiples of dt are sensitive to floating point issues for refractoriness and SpikeGeneratorGroup #949

Closed mstimberg closed 6 years ago

mstimberg commented 6 years ago

Our check whether a neuron is not refractory anymore is not taking care to deal with values at dt boundaries, e.g. for a refractory time of 1ms, it will simply check

not_refractory = (t - lastspike) > 0.0001

This means that for a simulation dt of 0.1ms and a refractory time of 1ms, the actual refractory period will not necessarily be 10 time steps. For example, in the below code it is 11 time steps in 10% of the cases

from brian2 import *
import collections
G = NeuronGroup(1, '', threshold='True', refractory=1*ms)
spike_mon = SpikeMonitor(G)
run(1*second)
print(collections.Counter(np.diff(np.int_(np.round(spike_mon.t/defaultclock.dt)))))
Counter({10: 899, 11: 91})

A workaround is to use a refractory time slightly below the requested time, e.g. refractory=0.99*ms.

I think the best fix is to subtract a small amount from the refractory period that we are testing against, say 1e-6*dt -- this should fix the problem for the "at the boundary" values without changing anything for values that weren't a multiple of dt before. The downside of the fix is that some older simulations will no longer be exactly reproducible with newer versions of Brian, but I think our policy so far has been to accept this as long as the previous behaviour was clearly buggy (as I think is the case here).

This was first reported by @zhouyanasd in issue #943 who observed that network activity changed depending on the absolute time of the network (the above check's floating point issues are sensitive to the absolute time).

thesamovar commented 6 years ago

This works if the user specifies refractory as a quantity that is a multiple of dt, but is slightly tricky in other cases. For example, if the user specifies refractoriness as being (for example) i*refractory_max/(N-1) what do we do? We could check for each neuron if the refractoriness is an approximate multiple of dt and if so slightly modify it (although this check would have to be done for each time step if refractory_max was non-constant), but if we always just simply subtract some fixed multiple of dt off then in some cases what would have been unambiguous may become subject to this problem. How about if the user specifies refractoriness themselves as t-lastspike>1*ms? Do we try to inspect this string and modify it, or do we accept that this specifying refractoriness like this would give a different result to refractory=1*ms?

I agree this is an important issue, but not sure if we can apply such a simple approach to fix it. Also not sure yet what the correct approach should be.

One approach that might work is to do something similar to what we did with TimedArray, but this still wouldn't help in the case where the user specifies a string expression for refractoriness.

mstimberg commented 6 years ago

Yeah, I agree that this is tricky... Part of the problem is that we never specified what a refractory time that is not a multiple of dt is supposed to mean exactly. Right now, a refractory time of e.g. 1.01ms will be the same as a refractory time of 1.1ms (for a dt of 0.1ms). Maybe the most logical thing would be to round it to the nearest time step in general? The issue is slightly less complicated than for TimedArray, as we are not caring about values within a time step.

I don't think we should modify string conditions, we cannot guarantee that it would work in all cases anyway. But maybe we could make the same mechanism available that we use for the refractory=time defintion? What I mean is we could provide a function as_steps(t) which converts time to an integer via int(round(t/dt)) and our refractoriness implementation would become as_steps(t - lastspike) > as_steps(1*ms). Users could then use the same formulation if they wanted (e.g. when combining it with a condition that is not based on time).

I think whatever approach we use, it will change results for some users. I don't mind that much, given that our previous behaviour was somewhat undefined. But still, let's say we go for some principled approach such as the rounding to dt I described above; maybe we should add a new preference legacy_refractoriness that would be off by default but could be switched on to give the previous results?

thesamovar commented 6 years ago

Modified to add that the same problem occurs with SpikeGeneratorGroup. We're doing the problematic hack at the moment:

            # We shift all the spikes by a tiny amount to make sure that spikes
            # at exact multiples of dt do not end up in the previous time bin
            # This shift has to be quite significant relative to machine
            # epsilon, we use 1e-3 of the dt here
            shift = 1e-3*dt
            timebins = np.asarray(np.asarray(self._spike_time + shift)/dt,
                                  dtype=np.int32)
thesamovar commented 6 years ago

I like your as_steps idea although I'm not sure we can just do rounding for refractoriness (it might be OK for SpikeGeneratorGroup, not sure). For example, if refractory=0.6*dt then that should probably be equivalent to refractory=0*dt not refractory=1*dt. I checked how we do the rounding in TimedArray, and it's roughly this. We choose a fraction dt/K and then map anything from idt-dt/K up to (i+1)dt-dt/K to index i. For TimedArray we choose the fraction K small enough so that even if we're using a multistep method in the group (up to 8 steps) it will work. I think here we'd want to do something similar, but I can't see any particular reason for a value of K. Maybe just something large?

For reference, the code to get i from t is:

            epsilon = dt / K
            i = np.clip(np.int_(np.round(np.asarray(t/epsilon)) / K),
                        0, len(values)-1)
thesamovar commented 6 years ago

Also maybe we can do better than as_steps? How about just timestep(t)?

mstimberg commented 6 years ago

For example, if refractory=0.6dt then that should probably be equivalent to refractory=0dt not refractory=1*dt

I'm not sure, should it? With rounding our semantics would be: make the smallest mistake possible given the dt constraint. Personsally I wouldn't be shocked if 0.6*dt translated to dt... What I'd be more worried about is that some users might have subtracted 0.5*dt from refractoriness values explicitly to avoid the issues at dt boundaries. If we round, then 0.5*dt becomes a boundary where we might round up or down depending on small floating point differences...

I checked how we do the rounding in TimedArray, and it's roughly this. We choose a fraction dt/K and then map anything from idt-dt/K up to (i+1)dt-dt/K to index i. For TimedArray we choose the fraction K small enough so that even if we're using a multistep method in the group (up to 8 steps) it will work. I think here we'd want to do something similar, but I can't see any particular reason for a value of K. Maybe just something large?

Given that we don't have to care about multistep methods for thresholding, K = 1 would make sense to me which actually leads to the rounding method I proposed :)

Also maybe we can do better than as_steps?

Definitely, it was just meant as a placeholder!

How about just timestep(t)?

I wanted to propose this at first as well, but then I realized that we actually already have an integer variable called timestep. I don't think we ever advertised this in the docs or elsewhere, but in principle someone could already be using it...

thesamovar commented 6 years ago

Reason being that if a neuron has a refractory period of 0.6dt and spikes at time t=0 then at time t=dt it is ready to fire again (t>refractory).

I'd not worry too much about people having subtracted 0.5dt since I doubt that many people are aware of the issue (given that we weren't even aware until now).

I think the optimal choice of K might be the largest value that doesn't lead to any errors for some large number of timesteps. Let's say the longest we might typically imagine someone running a simulation for would be a week of simulated time at t=0.1ms. We could just calculate what the largest value of K would always get the timestep correct over this range.

For name, yeah that's a problem. Will think about it and see if anything else comes to me.

thesamovar commented 6 years ago

Hmm, actually thinking a bit more about this we probably ought to consider which scheme makes the fewest errors given that spike times are uniformly distributed in (t, t+dt). On my phone right now so haven't worked it out, but maybe it's rounding.

It would also be good to make a test based on the example that led us to this problem, incorporating refractoriness, spike generator and timed array. If I remember correctly, the example has the same input and initial conditions but different starting time, and should give the same results.

On 18 May 2018 09:06:56 BST, Marcel Stimberg notifications@github.com wrote:

For example, if refractory=0.6dt then that should probably be equivalent to refractory=0dt not refractory=1*dt

I'm not sure, should it? With rounding our semantics would be: make the smallest mistake possible given the dt constraint. Personsally I wouldn't be shocked if 0.6*dt translated to dt... What I'd be more worried about is that some users might have subtracted 0.5*dt from refractoriness values explicitly to avoid the issues at dt boundaries. If we round, then 0.5*dt becomes a boundary where we might round up or down depending on small floating point differences...

I checked how we do the rounding in TimedArray, and it's roughly this. We choose a fraction dt/K and then map anything from idt-dt/K up to (i+1)dt-dt/K to index i. For TimedArray we choose the fraction K small enough so that even if we're using a multistep method in the group (up to 8 steps) it will work. I think here we'd want to do something similar, but I can't see any particular reason for a value of K. Maybe just something large?

Given that we don't have to care about multistep methods for thresholding, K = 1 would make sense to me which actually leads to the rounding method I proposed :)

Also maybe we can do better than as_steps?

Definitely, it was just meant as a placeholder!

How about just timestep(t)?

I wanted to propose this at first as well, but then I realized that we actually already have an integer variable called timestep. I don't think we ever advertised this in the docs or elsewhere, but in principle someone could already be using it...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/brian-team/brian2/issues/949#issuecomment-390129341

thesamovar commented 6 years ago

I changed my mind again, I think even if we ought to think of spikes as occurring in (t, t+dt) that won't be what users are normally expecting. They'll be expecting that with a t=3ms and a refractory=1ms then it should be able to spike again at t=4*ms and not before or after, whatever dt is.

So I ran the experiment I mentioned above computing how often you get an error from rounding with a upscaling factor of K. I never found a single error for t=i*dt with i going up to around 2^30 (which is about 1 day of simulated time at dt=0.1ms), for a variety of uniformly randomly selected dt in (0, 0.1ms), and K in [1, 10, 100, 1000, 10000, 100000]. I would suggest then using K=100000 so that we get the correct timestep up to an accuracy of 0.001% of dt because this seems easy to explain and means we effectively never get an error (assuming people don't run for more than 2^30 time steps and use refractory=int multiple of dt), or that errors are less htan 0.001% for non-integer multiples of dt. This seems like a reasonable trade-off to me.

For refractoriness, we also have the choice between > and >=. I see at the moment we have >: does this match user expectations I wonder?

mstimberg commented 6 years ago

I think the multiples of dt are not the difficult question -- we certainly agree that it should fire exactly one refractory period later (which it currently does not, so that is the main bug that we are fixing now). What I find more difficult to decide is what to do about values in between. Your proposal above (with something like K=100000) would keep the current semantics, i.e. a value that is slightly above dt will lead to a refractory period rounded up to the next dt. My previously proposed alternative would round it down. I think both are relative straightforward to explain and I could imagine both have been used in other simulators or user-written code (your proposal when people do a comparison similar to ours, my proposal when the refractory period is converted into a number of time steps at the start of a run). Your proposal has the advantage that it keeps the current semantics, mine has the advantage that the actual refractory period is closer to the requested one. In principle I could see yet another way where we cleverly manipulate the intervals so to get an even closer approximation in the long run (e.g. for a requested refractory period of 1.5dt we'd alternate between 1dt and 2dt), but I don't see to make this not horribly confusing.

Maybe for the sake of backwards compatibility it would be indeed better to use your proposal -- it seems to be the solution that changes existing results minimally while at the same time fixing the annoying "multiples of dt" issue. Not sure, should we still have a preference to switch back to the previous behaviour to get exact reproducibility?

For refractoriness, we also have the choice between > and >=. I see at the moment we have >: does this match user expectations I wonder?

I think with your rounding proposal it has to be >= so that we get spikes every 1ms for refractory=1ms. In the current code >= and > do not make much of a difference, as we basically never achieve exact floating point equality any way.

Oh, about the timestep variable. This is indeed a variable of the clock object, but we never exposed it to the user so they cannot actually use it in their equations. Adding a function timestep should therefore be fine.

thesamovar commented 6 years ago

For > and >= the issue is, what do we expect from this code?

G = NeuronGroup(1, 'v:1', threshold='v>-1', refractory=1*ms)
M = SpikeMonitor(G)
run(10*ms)
print M.t[:]/ms

I just ran this here and got:

[ 0.   1.1  2.1  3.2  4.2  5.3  6.4  7.5  8.6  9.6]

So sometimes a period of 1.1ms between spikes and sometimes 1ms. I'm actually not sure what the user would expect to happen here! I suspect most people would want 1ms between each spike, which would (I think) correspond to >= on the timestep, but I can imagine someone making a case for 1.1ms between each spike because at t=1ms if the refractory period is 1ms then at t=1ms it is still refractory. For it to not be refractory we'd have to assume an open interval of refractoriness (0, 1) ms instead of a closed interval. This makes sense to me, and I suspect most people would assume that you'd see an ISI of 1ms with refractoriness of 1ms. What do you think?

thesamovar commented 6 years ago

But strictly speaking this is different to what is written which is t-lastspike>1*ms which would (with exact arithmetic) mean you should have ISIs of 1.1ms.

mstimberg commented 6 years ago

For > and >= the issue is, what do we expect from this code?

Well, I thought this was the bug we were discussing in this issue in the first place: ) I thought we agreed we'd expect 1ms every time but currently we get sometimes 1ms and sometimes 1.1ms. I think the semantics make the most sense with an ISI = refractory period, so that means >=. Implementing your proposal with >= and our current mechanism looks reasonable:

G = NeuronGroup(1, 'v:1', threshold='v>-1',
                refractory='int((t - lastspike)/(dt/1e5) + 0.5) < int((1*ms)/(dt/1e5) + 0.5)')
M = SpikeMonitor(G)
run(10*ms)
print M.t[:]/ms
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
thesamovar commented 6 years ago

Oh well I thought that we had agreed that it shouldn't sometimes be 1ms and sometimes 1.1ms but hadn't actually agreed which of the two options it should be. But I think yes, it feels like the most logical choice of the two.

Now for SpikeGeneratorGroup what do we do? Just rounding (i.e. minimise error in spike time), or the same approach as here?

mstimberg commented 6 years ago

Now for SpikeGeneratorGroup what do we do? Just rounding (i.e. minimise error in spike time), or the same approach as here?

I think we should use the same approach as here to be consistent, this would also keep the current semantics (apart from corner cases where times were different from a multiple of dt by less than the 1e-3dt shift, but more than the 1e-5dt interval that we'll use now). We seem to have already discussed a similar question a long time ago: https://github.com/brian-team/brian2/issues/327#issuecomment-58174009

I am a bit worried by the comment in our current code, though:

            # This shift has to be quite significant relative to machine
            # epsilon, we use 1e-3 of the dt here

This seems to imply that rounding to dt/1e5 as you suggested would not work, but I don't really see why. I might be a bit confused by now, but rounding to dt/1e5 is equivalent to shifting by 0.5dt/1e5, no? So the approach is actually not qualitatively different to what we did before (for SpikeGeneratorGroup)...

Note that we already have test_spikegenerator_rounding_long to check for the correct rounding, but it's a "long" test and therefore not run by default.

thesamovar commented 6 years ago

OK agreed based on discussion from #327. Shifting is not quite the same as the trick above actually. Firstly, a spike at time t-eps/2 would be shifted to t-3eps/2 which would then be in a different interval to using round. Secondly there might be differences in accuracy based on doing a multiply operation and then an add rather than the other way round? In any case, I verified that it gives the correct result with the following code:

%matplotlib inline
from pylab import *
import time
from numba import jit

@jit(nopython=True)
def timestep(t, dt, K):
    eps = dt/K
    i = int(round(t/eps)/K)
    return i

@jit(nopython=True)
def finderrors(K, tmax=3.0*60*60, dt=0.1):
    realtime = 100.0
    second = 1.0
    ms = 0.001*second
    dt = dt*ms
    imax = int(tmax/dt)
    errors = 0
    for i in xrange(imax):
        if i!=timestep(i*dt, dt, K):
            errors += 1
    return imax, errors

dt = 0.1*rand()
tmax = 24*60*60.0*dt/0.1
print 'tmax=%.1f h, dt = %s ms, imax=2^%.1f' % (tmax/(60*60), dt, log2(tmax/(dt*0.001)))
for K in [1, 10, 100, 1000, 10000, 100000]:
    imax, errors = finderrors(K, tmax, dt=dt)
    print 'K=%d, errors %d (%.2f %%)' % (K, errors, errors*100.0/imax)
mstimberg commented 6 years ago

Firstly, a spike at time t-eps/2 would be shifted to t-3eps/2 which would then be in a different interval to using round.

Not sure I understand, what is eps here relative to dt?

I still have a hard time seeing a qualitative difference to rounding with a shift. With both methods, values that are less than a certain distance from the next dt boundary will be incorrectly moved up (for tiny distances, this is of course what we want as a workaround for imprecise floating point arithmetics). If we were worried only about i*dt type of values, we could use the simplest rounding as your test showed (K=1 has 0 errors). Not saying that your method is not better, but I'd like to have a specific problem with the current method before changing it.

thesamovar commented 6 years ago

Actually you're right yes, the old method is the same but with a slightly different value of K - sorry I missed what you said there. In principle it could be different because of the different order of operations, but in practice it seems not. I just checked and both methods work fine (0 errors) for all K>1. Should we just use the old shift expression for refractoriness then (perhaps with a timestep function to simplify it) and that's all we need to do? Sorry if I've been a bit slow here!

mstimberg commented 6 years ago

Should we just use the old shift expression for refractoriness then (perhaps with a timestep function to simplify it) and that's all we need to do?

Yes, I think this would be the best approach. It should have minimal impact on previous results except for fixing the issue with the refractoriness. Do you want to work on this or should I?

Sorry if I've been a bit slow here!

No worries, I got quite confused along the way as well...