Mostafa-Samir / DNC-tensorflow

A TensorFlow implementation of DeepMind's Differential Neural Computers (DNC)
MIT License
581 stars 164 forks source link

train vector copy #6

Closed linamede closed 7 years ago

linamede commented 7 years ago

Trying to train a model that copies a 10x1 vector, the loss converges, however the read weights seem to be fade. Why is this happening? (I changed memory_words_num to 20, and memory_word_size to 20)

train-series_10x1_tar txt_min_avg_loss_0 0045

train-series_10x1_tar-50000

Mostafa-Samir commented 7 years ago

Hmmm ... interesting!

It could be possible because the input is small and needs only one memory location, the neural network was able to learn to copy it directly without needing to query the data it wrote in memory. you can check that by running the same trained model with only 1 memory location and see if it still outputs correctly.

I'll keep this issue opened for sometime to understand what's happening!

linamede commented 7 years ago

Using a memory of size one resulted in an error during the visualization... Finally to train a vector, I used the train.py instead of train-series.py, which trains with random length. This worked and resulted in something like this. Thank you!

train-40000_length_1

Mostafa-Samir commented 7 years ago

I believe that error in the visualization can be avoided by tweaking the visualize_op method a little. Anyway, I'm glad you got your results eventually!