google-deepmind / dnc

A TensorFlow implementation of the Differentiable Neural Computer.
Apache License 2.0
2.5k stars 443 forks source link

Output vector inconsistent with the nature paper? #7

Closed jingweiz closed 7 years ago

jingweiz commented 7 years ago

Hey, In lines 118~121 of dnc.py, the final output is by passing the concatenated controller_output and access_output throught a Linear

    output = tf.concat([controller_output, batch_flatten(access_output)], 1)
    output = snt.Linear(
        output_size=self._output_size.as_list()[0],
name='output_linear')(output)

But according to the nature paper, in the part above Interface parameters in the left column of page 477, it is stated as: Finally, the output vector y_t is defined by adding v_t to a vector obtained by passing the concatenation of the current read vectors through the RWxY weight matrix W_r.

So should the output from the controller and the output from the read heads first be concatenated then passed through a Linear layer, or should the output from the read heads first be passed through a Linear layer then be concatenated with the output from the controller? Or am I misreading sth here? Thanks in advance!

dm-jrae commented 7 years ago

It is actually consistent with the paper, but this may not be immediately clear.

The v_t as defined in the paper is the linear-transformed controller output (two equations up from the one that you reference). I.e. an embedding W_y is applied to the h_i. So in the code 'controller_output' is actually [h^1_t, .. h^L_t] and v_t is this vector passed through a linear to shape it to the output size. It is then added to the embedded read words.

Concatenating two vectors and passing them through a linear (as defined in the code) is trivially equal to adding the two vectors passed through separate linears (as defined in the paper). The only discrepancy is that bias terms are used within the linears in the code, whereas no biases are described in the paper.

Does that make sense?

jingweiz commented 7 years ago

Ah right exactly, thanks a lot for the reply :)

jingweiz commented 7 years ago

Oh another thing, is the output of the Neural Turing Machine using the same mechanism as DNC (by linearly transform the concatenated hidden_state and the read_vectors_of_timestep_t, then clip in between (-20.,20.)), or its output is just a linearly transformed hidden_state? Cos this part is not so clear from the NTM paper. And just to make sure, for both NTM and DNC, is the input to the controller module always: input_sequence and the read_vectors_of_timestep_t-1? Thanks a lot!

dm-jrae commented 7 years ago

And just to make sure, for both NTM and DNC, is the input to the controller module always: input_sequence and the read_vectors_of_timestep_t-1?

It is.

Oh another thing, is the output of the Neural Turing Machine using the same mechanism as DNC (by linearly transform the concatenated hidden_state and the read_vectors_of_timestep_t, then clip in between (-20.,20.)), or its output is just a linearly transformed hidden_state? Cos this part is not so clear from the NTM paper.

Both have the same output mechanism. The value clipping used here is an implementation detail to just maintain numerical stability and avoid NaNs.