google-deepmind / dnc

A TensorFlow implementation of the Differentiable Neural Computer.
Apache License 2.0
2.5k stars 443 forks source link

write to memory with multiple write heads #10

Closed jingweiz closed 7 years ago

jingweiz commented 7 years ago

Hey, So when there're multiple write heads, when writing to memory with these variables:

write_weights: [batch_size x num_write_heads x memory_size]
erase_vectors: [batch_size x num_write_heads x word_size]
write_vectors: [batsh_size x num_write_heads x word_size]
memory: [batch_size x memory_size x word_size]

the erase operation is by:

erase_gate = 
write_weights {reshape to: [batch_size x num_write_heads x memory_size x 1]} 
x 
erase_vectors {reshape to: [batch_size x num_write_heads x 1 x word_size]}
= shape: [batch_size x num_write_heads x memory_size x word_size]

then the 2nd dim is reduced by taking a product over this dimension. While for the write operation following this erase, this 2nd dimention is reduced directly by the matmul:

add_matrix = 
write_weights {reshape to: [batch_size x memory_size x num_write_heads]} 
x 
write_vectors {reshape to: [batch_size x num_write_heads x word_size]}
= shape: [batch_size x memory_size x word_size]

Is this correct? Cos I didn't get this part from the paper and want to make sure I get it right. Thanks in advance!

dm-jrae commented 7 years ago

This is intended, the reduction is a product for the multiplicative erase and a summation for the additive write. In the paper only one write head was used, but this implementation is more general to facilitate people playing with more write heads for other applications where this might be crucial.

jingweiz commented 7 years ago

Thanks a lot! That's really helpful!