ctn-waterloo / modelling_ideas

Ideas for models that could be made with Nengo if anyone has time
9 stars 1 forks source link

Custom neuron-type for all-to-all inhibition #80

Open Seanny123 opened 7 years ago

Seanny123 commented 7 years ago

I was talking to @tcstewar today and he mentioned how all-to-all inhibition in an ensemble is quite expensive to implement, because it's a whole NxN neuron matrix with -1 everywhere except the diagonal. He recommended to instead use a custom neuron-type that would take an N-size activities vector and calculate activities for each neuron given this vector.

I'm not sure how to code this, given my unfamiliarity with the back-end. @jgosmann, would this be feasible/desirable?


I thing this might be better as a NEP, but I'm not ready to make it yet, because I don't know the details and consequences of what I'm proposing yet.

arvoelke commented 7 years ago

You could also do this in O(n) time and space using an intermediate node. For example, here's a spike-based winner-take-all:

import matplotlib.pyplot as plt

import nengo
from nengo.utils.matplotlib import rasterplot

with nengo.Network() as model:
    stim = nengo.Node(output=1)
    x = nengo.Ensemble(10, 1)
    inhibit = nengo.Node(size_in=x.n_neurons, output=lambda _, a: a - a.sum())

    nengo.Connection(stim, x, synapse=None)
    nengo.Connection(x.neurons, inhibit, synapse=0.01)
    nengo.Connection(inhibit, x.neurons, synapse=None)

    p = nengo.Probe(x.neurons, 'spikes')

with nengo.Simulator(model) as sim:
    sim.run(0.5)

rasterplot(sim.trange(), sim.data[p])
plt.show()

download 4

I think what Terry was suggesting is to essentially roll this into the neuron model for convenience, but practically speaking it might be quite involved to make this possible for every neuron model and difficult for the user to understand what is actually being done.

By the way, an important thing to be aware of with this approach is that the inhibition is multiplied by the gain on each neuron (and the bias current will always be there automatically).

tcstewar commented 7 years ago

I think what Terry was suggesting is to essentially roll this into the neuron model for convenience, but practically speaking it might be quite involved to make this possible for every neuron model and difficult for the user to understand what is actually being done.

I think you're completely right that it'd be odd to do this for every neuron model. I was more envisioning a custom "LIFWTA" class for that one special case, as a way of exploring whether this sort of thing might be useful.

The Node implementation is also very handy -- the only advantage I could see of having it be part of the neuron model itself is that it would be possible to have it automatically taken into account when solving for decoders. (Note: actually implementing that will probably be a bit tricky, unless we fall back on the "feed input into the ensemble over time and observe spikes" system for generating the A matrix)

arvoelke commented 7 years ago

it would be possible to have it automatically taken into account when solving for decoders

What would this mean mathematically, since we are mixing neural space with vector space? How do we put (postsynaptic) evaluation points (vectors) into correspondence with the amount of (presynaptic) cross-inhibition (activities)? :confused:

tcstewar commented 7 years ago

What would this mean mathematically, since we are mixing neural space with vector space?

I'm not sure why it's mixing neural and vector spaces. Given some input x there's still going to be some a = G[dot(e,x)] thing going on, it's just that G will be rather complicated and dependent on all the other encoders and gains and biases. So I'm just thinking of the mutual inhibition stuff as something that gets effectively rolled into the neuron model. At least, that's what works in my head, I could be totally wrong as I've never actually done this... (and I will also note that I'm envisioning solving for those decoders while ignoring the dynamics of the synapse -- i.e. using the steady-state value).

arvoelke commented 7 years ago

So, assuming it's recurrently connected with this cross-inhibition, and you want to solve for the decoders given the effect of inhibition corresponding to each respective evaluation point... I think the problem is we only really know the amount of inhibition at t=0. The inhibition will affect the amount of input current, which affects the firing, which in turn affects the amount of future inhibition.

I'd like to know if anything sensible happens in this case. Doesn't seem obvious what the decoders would tell you (in terms of how the decoded value actually relates to the prescribed function), or how it might converge/oscillate.

What you are probably wanting (?) is more alone the lines of finding the steady-state activities from the inhibition given each evaluation point, but it's not clear if it's possible to do this analytically or if we'll need some sort of iterative numerical method akin to simulating it over time. At the same time, I'm skeptical that this will give a good decoding, since how it stabilizes may be extremely sensitive to the entire state of the system. Either way I'd be interested to see. In practice it might be best to "fall back" to the explicit simulation over time, as you said earlier.

But most importantly, what would be the function of this system? I feel I'm still missing this point. If all we're trying to do is compensate for the inhibition while approximating the prescribed function, then what is the point of the inhibition in the first place? What are we trying to get this population to do in comparison to just computing that function without inhibition?

tcstewar commented 7 years ago

What you are probably wanting (?) is more alone the lines of finding the steady-state activities from the inhibition given each evaluation point

Yes, I think that's exactly what I'm wanting -- I hope that's analytically possible, but if not some interative method hopefully won't be too horrible....

But most importantly, what would be the function of this system? I feel I'm still missing this point. If all we're trying to do is compensate for the inhibition while approximating the prescribed function, then what is the point of the inhibition in the first place? What are we trying to get this population to do in comparison to just computing that function without inhibition?

Ah, yes, this is the most important thing. :) It's still not clear to me exactly how to express this, but there are four lines of thinking that are somehow mixed up in my head about this, so here's a quick and rather incoherent summary of those....

1) lots and lots of neural network people do something like this. This is the heart of how Emergent-type people do the WTA circuits, for example (and they even cheat and do it with a horribly bad approximation most of the time). I've had three or four people ask me how to do it in Nengo, so it'd be nice to have a clean answer (and the Node approach is pretty good for this).

2) it seems pretty pervasive in biology. This leads to sparse representations in lots of places (or, at least, that's my impression, but that might also be just due to seeing lots of normal neural network people use it)

3) sparse representations are very useful for online learning and avoiding catastrophic forgetting. This is actually the main reason I'm exploring it right now. The declarative memory model needs an extremely sparse representation, otherwise learning pretty much any new thing disrupts old trained relationships pretty badly. It'd be handy to have a simple dial that controls sparsity for these sorts of situations. Note that this was also necessary for @Seanny123 's addition model. Also, it's worth noting that for those situations, PES was being used to compute decoders (initialized to 0), so we didn't need to worry about decoder solving. But it'd be nice to have that option.

4) it feels like it'll do something like automatic normalization, even from very weak signals. This is why I haven't just implemented sparsity by having intercepts=nengo.dists.Uniform(0.9, 1.0). That ends up with no firing at all for weak input signals (length<0.9), and in high dimensions there are also a lot of unit vectors that end up with no firing at all, just due to encoder sampling. The mutual inhibition thing ensures there's always some neurons firing, and if you give a weak version of the input, you get about the same firing as with a stronger version of the input (i.e. the direction matters but the magnitude doesn't).

So, I'm really not sure how those 4 fit together, or even if they are all true, but that's the line of reasoning that led here. And it's leading to some nice results in terms of fits to MEG data for Jelmer's model, so maybe there's something here.... :)

jgosmann commented 7 years ago

Just skimmed this thread. It seems that @Seanny123 initial comment mentions a fifth reason: performance. Maybe the performance issue can be solved by using a sparse matrix data structure? (Haven't thought about any details though.)