Closed go-bro closed 7 years ago
There are a couple different ways to implement sequence-to-sequence models; for an academic overview see Cho et al, 2014 and Sutskever et al, 2014. Let me propose a simple starting point (more complicated examples would likely require hacking the current layers, which is doable but less straightforward).
The simple approach is to take your input sequence and pass it through an RNN stack, keeping only the last timestep. This last timestep is a vector encoding of the input sequence, in some sense. We can then present this vector to the decoder RNNs at each timestep, asking it to produce the output sequence given the encoding vector.
model:
# The input sequence. Let's assume your input sequences have 100 timesteps,
# are are each a one-hot vector representation of a 29-word vocabulary.
- input:
shape: [100, 29]
name: input_sequence
# The encoder stack. We can have as many encoder layers as we want...
- recurrent:
size: 512
# ... provided that our last encoder layer produces a single vector.
- recurrent:
size: 512
sequence: no
# Optionally, we can introduce another dense layer.
- dense: 100
- activation: relu
# Now we will repeat this encoded vector and present it to the decoder layer
# fifty times.
- repeat: 50
# The decoder stack.
- recurrent:
size: 512
# Your last decoder layer could have `size` equal to the output vocabulary
# size, or we can use a dense layer to reshape each output sequence.
- parallel:
apply:
- dense: 29
# Softmax so that we get one-hot outputs, suitable for categorical
# cross-entropy loss.
- activation: softmax
# The final output sequence. Each output has 50 timesteps, and each timestep
# is a one-hot encoded vector representation of a 29-word vocabulary.
- output: output_sequence
(Note that the "repeat" layer was only recently added to Kur, so be sure to use the latest version from GitHub.)
Sutskever et al, 2014 found that their models worked best when the input sequence was reversed. So instead trying to train the mapping "A B C" -> "X Y Z", instead try mapping "C B A -> X Y Z". Thus, the recommendation on preprocessing your data would be:
[null, 29]
as the shape of the input layer, and use batches of size 1).You can also play with different "signal" words, like the
Thanks! I'm getting some good results with the repeat layer 👍
Hello, I am new to Kur and trying to make a sequence-to-sequence model similar to this for machine translation: tensorflow seq2seq model
I need to have variable-length input and variable-length output. I think for output sequence I can use CTC like in the ASR example. To test, I tried setting the input to
transcript
just like the output of the speech model. This is the error I get:Here is my Kurfile:
Could you give me some advice on getting this network configured to use character text sequence input? Do I need to write a new supplier?