google-deepmind / dnc

A TensorFlow implementation of the Differentiable Neural Computer.
Apache License 2.0
2.5k stars 443 forks source link

Replacing sorted allocation with weighted softmax #11

Closed itamab closed 6 years ago

itamab commented 7 years ago

Hi, I was wondering if the sorting non-differential part of the allocation mechanism was really necessary? At least for the RepeatCopy it looks like replacing it with weighted softmax with strength 2 gives more stable and slightly better results and makes the network differentiable. Learning the strength parameter may even improve it as in the content-based addressing case.

I just comment out the line: write_weights = tf.stop_gradient(write_weights) and replaced: return batch_gather(sorted_allocation, inverse_indices) with return weighted_softmax(nonusage, 2.0, tf.nn.softplus)

thanks

Joshuaalbert commented 7 years ago

I accidentally made an issue requesting this in #21 and provide the code for it there if you want it.

itamab commented 7 years ago

Thanks, I actually published a workshop paper on it in ICML :) http://ttic.uchicago.edu/~klivescu/MLSLP2017/MLSLP2017_ben-ari.pdf

On Tue, Oct 10, 2017 at 12:52 AM, Joshua George Albert < notifications@github.com> wrote:

I accidentally made an issue requesting this in #21 https://github.com/deepmind/dnc/issues/21 and provide the code for it there if you want it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/dnc/issues/11#issuecomment-335300714, or mute the thread https://github.com/notifications/unsubscribe-auth/AJJ3-M6PKGazT1OaOzwLrzBV03xSV7BYks5sqpWwgaJpZM4NovhQ .

-- Best,

Itamar Ben-Ari

Joshuaalbert commented 7 years ago

Great to hear, and did you also make any modifications to the temporal linking?

itamab commented 6 years ago

I just added a new strength parameter 'allocation_strength' to the control vector and used it for softmaxing (1 - usgae) vecotor. here are the main changes:

add parameter to the control vector (MemoryAccess._read_inputs):

allocation_strength = snt.Linear(1, name='allocation_strength')(inputs)

softmaxing (1 - usgae) vecotor (Freeness._allocation):

nonusage_weighted_softmax = weighted_softmax(1 - usage, allocation_strength, tf.nn.softplus) return tf.squeeze(nonusage_weighted_softmax, 1)

dm-jrae commented 6 years ago

This version of the code is feature-frozen to keep it clean and consistent with the paper, but it's great that better allocation mechanisms have been discovered --- thanks for the link to your paper also!