AlliedToasters / synapses

MIT License
70 stars 12 forks source link

An interesting effect in the connectivity graphs in your Jupiter notebook #1

Open anhinga opened 5 years ago

anhinga commented 5 years ago

Looking at the connections per input pixel graphs in your Jupiter notebook, I see an interesting effect.

Unless it is a rendering bug, your simulations show smaller number of connections in the central area compared to the rest of it, while Mocanu et al show larger number of connections in the central area compared to the rest of it...

I'll try to figure out if it is just a rendering inversion or an actual effect here...


(Other than that, I happy to report that this notebook runs successfully under PyTorch 1.0 and Python 3.7.)

anhinga commented 5 years ago

It looks like this is a real effect, and not a rendering inversion.

It would be interesting to understand the reason for that...

anhinga commented 5 years ago

My preliminary conjecture is that the reason might be the differences in regularization.

Here the goal function does not seem to have any regularization: "forward" function has "return F.log_softmax(x, dim=1)". In the absence of regularization, when the weights pointing from the outlying areas are created, they remain unchanged by training. At the same time, meaningful connections are changed by training, and occasionally become small and get eliminated.

At the same time, if there is a sufficiently strong regularization encouraging smaller weights, then one would expect the connections which are not informative to the result to decrease on average more rapidly, than the connections which are informative.

All this needs to be empirically checked, of course, and I should try to read the code by Mocanu et al to see what kind of regularization they might have.

anhinga commented 5 years ago

At PyTorch the recommended way to add L2 regularization is not via the loss function, but via weight decay parameter of an optimizer. So I am going to try to replace

optimizer = optim.SGD(sparse_net.parameters(), lr=lr, momentum=momentum)

with something like

optimizer = optim.SGD(sparse_net.parameters(), lr=lr, momentum=momentum, weight_decay=1e-5)

in the Jupiter notebook and see what happens...

anhinga commented 5 years ago

Yes, one needs a stronger regularization coefficient (1e-3, and not 1e-5), and then it works...

I'll be posting further details during the next several days...

anhinga commented 5 years ago

I have started to accumulate the notes and experimental Jupiter notebooks in a fork here:

https://github.com/anhinga/synapses/blob/master/regularization.md

AlliedToasters commented 5 years ago

@anhinga , Apologies for not getting back to you sooner!

This is an unusual effect, I saw it myself when I started to play with the library, but I was too busy with work to investigate. I'm happy to see you're doing this work!

Based on the little experience I had, I agree that it's related to the overfit solution. Particularly, MNIST has a little bit of noise on the pixels lying outside of the center of the image. I suspect the model is effectively "memorizing" noise points in these outer pixels, hence the inverse connectivity patterns we're seeing.

The RBM result is drastically different - this may be due to the fact that pixel values are converted to binary values by necessity. Thus, the noisy outer pixels problem isn't present.

I'm really happy to see you working on this, this was the whole idea with building this library!

You'll probably be happy to know that I've managed to speed up training time - release 0.1.14 includes these changes (although it also adds a new dependency to the library.)

anhinga commented 5 years ago

That's awesome that you have a speed-up and that you are using torch-scatter now!

Are you planning to also use torch-sparse by the same author?

(I was trying to learn to install both of them in the last few days, and it was a bit tricky :-) But it should be possible on my configuration too, I hope :-) )

anhinga commented 5 years ago

(Yes, I can install both torch-scatter and torch-sparse - it's just that I has CUDA software which I was not using, but which was interfering with installation of torch-scatter by its presence.)