Closed jingweiz closed 7 years ago
It is actually consistent with the paper, but this may not be immediately clear.
The v_t as defined in the paper is the linear-transformed controller output (two equations up from the one that you reference). I.e. an embedding W_y is applied to the h_i. So in the code 'controller_output' is actually [h^1_t, .. h^L_t] and v_t is this vector passed through a linear to shape it to the output size. It is then added to the embedded read words.
Concatenating two vectors and passing them through a linear (as defined in the code) is trivially equal to adding the two vectors passed through separate linears (as defined in the paper). The only discrepancy is that bias terms are used within the linears in the code, whereas no biases are described in the paper.
Does that make sense?
Ah right exactly, thanks a lot for the reply :)
Oh another thing, is the output of the Neural Turing Machine using the same mechanism as DNC (by linearly transform the concatenated hidden_state
and the read_vectors_of_timestep_t
, then clip in between (-20.,20.)
), or its output is just a linearly transformed hidden_state
? Cos this part is not so clear from the NTM paper.
And just to make sure, for both NTM and DNC, is the input to the controller module always: input_sequence
and the read_vectors_of_timestep_t-1
?
Thanks a lot!
And just to make sure, for both NTM and DNC, is the input to the controller module always: input_sequence and the read_vectors_of_timestep_t-1?
It is.
Oh another thing, is the output of the Neural Turing Machine using the same mechanism as DNC (by linearly transform the concatenated hidden_state and the read_vectors_of_timestep_t, then clip in between (-20.,20.)), or its output is just a linearly transformed hidden_state? Cos this part is not so clear from the NTM paper.
Both have the same output mechanism. The value clipping used here is an implementation detail to just maintain numerical stability and avoid NaNs.
Hey, In lines
118~121
ofdnc.py
, the finaloutput
is by passing the concatenatedcontroller_output
andaccess_output
throught aLinear
But according to the nature paper, in the part above
Interface parameters
in the left column of page477
, it is stated as:Finally, the output vector y_t is defined by adding v_t to a vector obtained by passing the concatenation of the current read vectors through the RWxY weight matrix W_r
.So should the output from the controller and the output from the read heads first be concatenated then passed through a Linear layer, or should the output from the read heads first be passed through a Linear layer then be concatenated with the output from the controller? Or am I misreading sth here? Thanks in advance!