ctn-waterloo / modelling_ideas

Ideas for models that could be made with Nengo if anyone has time
9 stars 1 forks source link

Learning unbinding #90

Open tcstewar opened 6 years ago

tcstewar commented 6 years ago

I'm pretty sure that a random network can work as a binding network (instead of circular convolution). It might be a bit less efficient than circular convolution, but it should work fine, and might lead to a better developmental story for where these binding systems come from. But, how do we unbind from such a network?

Well, if we want to unbind, then all we really need to do is train up the inverse operation. Feed random inputs into the binding network and learn the unbinding operation.

I've got an initial implementation of this here: https://github.com/tcstewar/testing_notebooks/blob/master/Random%20Binding.ipynb

In the example shown, it takes ~500 seconds to learn the inverse of a random 2D binding:

image

Here's the cosine of the angle between the actual output and the desired output:

image

If this does work for larger dimensionality, it'd be great to pre-learn a bunch of binding networks of different dimensionality, and then we can just drop them into other models.

jgosmann commented 6 years ago

Did you have a look at this notebook? It contains an attempt to formalize what properties a binding operator should have (though I think my definition there could still be improved a lot). Anyways, maybe more relevant is that it looks at different binding operations. If you use the “encoding by tagging” approach any matrix with an approximate inverse (the closer the approximation is to an actual inverse the better) can be used. Furthermore, in high-dimensional spaces any random matrix will be almost orthogonal (though it might need to be normalized, depending on how you sample), thus the transpose can be used as approximate inverse (still requires weight symmetry though). So yes, a random matrix can be used for “binding” (more tagging a vector, because it doesn't allow you to actually bind two arbitrary vectors, only one).

Over the past week or two I also did some work on how, given a decoder matrix D, the decoding matrix D^T D can be learned. Maybe that can be adapted ... though probably not ... and it is probably slower/worse than your approach.

Enough of my ramblings ... almost, I just looked at the notebook (should have done that first maybe?). So your binding approach is a little bit different than the “tagging” approach and I shortly thought about using different input transforms for the element-wise product. The problem if you stay in real space, the vector components will get smaller with each binding and I'm not sure whether adjusting with a constant factor is sufficient to counteract that. Even worse, the unbinding will involve a division in some way (to undo the product) which is always problematic and this might be the reason why the learned unbinding clips the output in your example.

But I will have to look at these things in more detail, these are just some random quick thoughts.

jgosmann commented 6 years ago

Oh, one more thing: The reason that the Fourier transform doesn't have the multiplication/division problem that much is that the coefficients are complex and thus we're doing mostly rotations in the complex plane (though there is some scaling for unitary vector and as I have shown before that is a problem for many repeated bindings).

tcstewar commented 6 years ago

Cool notebook.... and that's a very good point about the rotations-in-the-complex-plane issue seems like it keeps the inverse from having the division problem.... I wonder if that can turn into constraints on the randomly generated input matrices? Even something like forcing them to be orthonormal might simplify the inverse task and still be biologically possible.... hmm.... Definitely lots of things to try here....

tcstewar commented 6 years ago

(I also think throwing feedback alignment at the unbinding network might also help..... but that's a whole extra set of complexities... :) )