Added neural stack and queue

jeammimi commented 8 years ago

Hi, So I think I successfuly added a neural stack and queue. you can import the new layer with from seya.layers.stack import Stack there is a parameter called stack which set to True means that it is a stack and set to False means it is a queue. I also added a notebook called Comparison to compare LSTM NTM Stack and Queue on the copying task, And the queue as expected work quite fine. I notice that if you increase the hidden layer, at some point the Stack works as well as the queue. At the end of the notebook I also added a comparison between the python implementation of the stack and the theano. (To be sure that the theano version was working properly)

The only small problem that I had in the implementation was with time = 0. so I kind of start with time = 1

I think I need to clean the code and the documentation, as well as the notebook a little bit, but let me know what you think.

EderSantana commented 8 years ago

Amazing! I'll read your code and give suggestions carefully.

What was your final accuracy? I can't visualize the notebook on github. I can't see your fork on your account.

jeammimi commented 8 years ago

Yes sorry, it was on my organisation account. I moved it to my personal account. But the accuracy is 99.98 :)

2016-04-05 16:16 GMT+02:00 Eder Santana notifications@github.com:

Amazing! I'll read your code and give suggestions carefully.

What was your final accuracy? I can't visualize the notebook on github. I can't see your fork on your account.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/pull/26#issuecomment-205829454

jeammimi commented 8 years ago

A remark. I think It was not possible before to have more than 94 % 96% before because in your accuracy definition you were considering also the black part (after the copy). And some of the network were just outputing random bits as they were not trained on that part because of the weights. To force it to output "black bit" afterward, I just extended your weight from two time steps in order to include two black bit at the end of the copy.

2016-04-05 16:23 GMT+02:00 Jean-m. a. jeanmichel.arbona@gmail.com:

Yes sorry, it was on my organisation account. I moved it to my personal account. But the accuracy is 99.98 :)

2016-04-05 16:16 GMT+02:00 Eder Santana notifications@github.com:

Amazing! I'll read your code and give suggestions carefully.

What was your final accuracy? I can't visualize the notebook on github. I can't see your fork on your account.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/pull/26#issuecomment-205829454

EderSantana commented 8 years ago

whoa i didn't notice that!!! is that why NTM seems bad as well? I mean ntm is getting 56%

EderSantana commented 8 years ago

btw, I will merge what you did and do experiments on top of it! I believe this is pretty cool already and we can just continue from there

EderSantana commented 8 years ago

We should try to tackle Hierarchical Attentive Memory next http://arxiv.org/pdf/1602.03218v2.pdf its something like neural turing machine but with memory trees instead!!! I believe this is the one that will scale better for several applications. But i haven't read as carefully as ntm and neural stack. If you have feel interested in that let me know!

EderSantana commented 8 years ago

btw, do you have twitter? when I announce this implementation I'd like to give you the credit :)

jeammimi commented 8 years ago

This neural stak/queue are great. It is really impressive how fast it reaches 99.9!

This Hierarchical Attentive Memory looks quite interesting, thanks for pointitg it out.

I think the NTM result is bad because It reduced the learning rate a little bit. It should reach a very high accuracy but it takes more training. The mistake i pointed should increase the Accuracy and no decrease it. I will increase the learning rate a little bit and also the training time. and will run again the comparison.

for twitter I have one acount but don't use it much jmapasta https://twitter.com/jmapasta but thanks :)

2016-04-06 2:32 GMT+02:00 Eder Santana notifications@github.com:

btw, do you have twitter? when I announce this implementation I'd like to give you the credit :)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/pull/26#issuecomment-206048566

jeammimi commented 8 years ago

It seems that adding these two black bits in the weight gave a lot of trouble to the neural turing machine. When I remove them the result goes back to 94 -96 %

2016-04-06 8:33 GMT+02:00 Jean-m. a. jeanmichel.arbona@gmail.com:

This neural stak/queue are great. It is really impressive how fast it reaches 99.9!

This Hierarchical Attentive Memory looks quite interesting, thanks for pointitg it out.

I think the NTM result is bad because It reduced the learning rate a little bit. It should reach a very high accuracy but it takes more training. The mistake i pointed should increase the Accuracy and no decrease it. I will increase the learning rate a little bit and also the training time. and will run again the comparison.

for twitter I have one acount but don't use it much jmapasta https://twitter.com/jmapasta but thanks :)

2016-04-06 2:32 GMT+02:00 Eder Santana notifications@github.com:

btw, do you have twitter? when I announce this implementation I'd like to give you the credit :)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/pull/26#issuecomment-206048566

EderSantana commented 8 years ago

I don't what you men by two black bits. Btw I renamed the notebook to this one: https://github.com/EderSantana/seya/blob/master/examples/Neural%20Stacks%20and%20Queues.ipynb

jeammimi commented 8 years ago

If you take a look at the get_sample function, I modified your definition of weight, to include two more step sw[i, t:(2*t)+2] = 1

It include two step where all the bits are zero.

I did that to try to force the networks to output bits with only zero after it copied the output. But apparently this create some problem with the neural turing machine, because its accuracy is quite low.

I was thinking about computing the accuracy only on the part that is copied.

2016-04-06 21:18 GMT+02:00 Eder Santana notifications@github.com:

I don't what you men by two black bits. Btw I renamed the notebook to this one:

https://github.com/EderSantana/seya/blob/master/examples/Neural%20Stacks%20and%20Queues.ipynb

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/pull/26#issuecomment-206520344

EderSantana commented 8 years ago

yeah, we should modify the acc calculator for test to be only on the part that matter. I got what you mean, thanks!!!

jeammimi commented 8 years ago

the accuracy can be computed like that : acc = np.average(V[:, -min_size:, :] == Y[:, -min_size:, :], weights=np.repeat(sw[:,-min_size:,:],8,axis=2)) * 100

I wanted to run it today but my gpu is busy :) I will try it tomorow

2016-04-07 16:52 GMT+02:00 Eder Santana notifications@github.com:

yeah, we should modify the acc calculator for test to be only on the part that matter

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/pull/26#issuecomment-206939970

EderSantana / seya

Added neural stack and queue #26