Closed nakulcr7 closed 7 years ago
Hi Senade! There are a few things going on here.
Train with Optimizer You're right that the optimizer should have been properly parsed. This is a bug that was fixed in 7e2d3f6, which you can get in a bleeding-edge git checkout of Kur. If you don't want to update, then a workaround is to replace this:
optimizer: adam
with this:
optimizer:
name: adam
Train without Optimzer
This looks like a shape issue with your data. Your inputs are shape (1115294, 100, 1)
, that is, you have just over a million samples, each of shape (100, 1). Perfect. That matches with what you explicitly specified in your Kurfile. Now let's watch how the shape changes throughout the model. The first layer is the RNN, with size 256. This is going to turn each (100, 1) sample into a (100, 256) sample. Next, you pass it through a dropout layer, which doesn't affect shapes. Then you give it to a dense layer of size 65. That'll cast each (100, 256) sample into a (100, 65) sample. Finally, you have an activation layer that will not change shapes. The problem? This output---(100, 65)---is not the same as the output of your data, which is (39, ).
The Solution
Looking at your prepare
script, you probably are trying to predict the next character given the preceding 100 characters. You probably wanted the RNN to spit out only the last vector of the sequence (equivalent to Keras' return_sequences=False
). You probably also wanted the dense layer to map down to your 39-dimension output, not 65. If that's true, then this is the model
section you wanted:
model:
- input:
shape: [100, 1]
name: in
# Sample shape is currently (100, 1)
- recurrent:
size: 256
type: lstm
sequence: no
# Sample shape is currently (256, )
- dropout: 0.2
# Sample shape is currently (256, )
- dense: 39
# Sample shape is currently (39, )
- activation: softmax
name: out
There! Now everything is as you want it.
Even More You mentioned that this wasn't the model you really wanted, so let me help you out by writing the Keras model as a Kur model:
model:
- input:
shape: [100, 1]
name: in
- rnn:
type: lstm
size: 512
- dropout: 0.2
- recurrent:
size: 512
type: lstm
sequence: no
- dropout: 0.2
- dense: 39
- activation: softmax
- output: out
There! Note that Kur can infer shapes, so you could have simplified the input
section down to input: in
(dropping the name
and shape
entries).
Thank you so much for the clear explanation. I'm a beginner and this makes much more sense now. I'll continue working with this over the weekend :)
Hi,
I'm new to Kur and I'm trying to use it for Sentence Generation. Here's my flow:
1. Input data
2. Kurfile
3. Train with
optimizer
kur train kurfile.yml
4. Train without
optimizer
kur train kurfile.yml
I'm unsure how to fix these errors. The above kurfile is not the final model. Ideally, I would like to implement the below LSTM model
Thanks