deepgram / kur

Descriptive Deep Learning
Apache License 2.0
816 stars 107 forks source link

Changing speech example to use LSTM instead of GRU predicts empty strings. #7

Closed bharris47 closed 7 years ago

bharris47 commented 7 years ago

I'm experimenting with different configurations of the speech.yml example and can't seem to change the rnn type from gru to lstm. When I do, I get the following output at each validation step.

[INFO 2017-01-26 18:16:20,580 kur.model.executor:172] Validation loss: 834.754
Prediction: " "
Truth: "the great state of virginia mother of presidents went out of the union at last and north carolina tennessee and arkansas followed her but maryland kentucky and missouri still hung in the balance"

The prediction is always " " regardless of the truth value.

Any ideas?

scottstephenson commented 7 years ago

@bharris47: sorry we've taken so long to get to this. We'll look into it in the next couple days!

scottstephenson commented 7 years ago

I've replicated the problem, still looking into.

ajsyp commented 7 years ago

My intuition is that the problem is in the forward pass through the network. Perhaps the LSTM activations are blowing up, resulting in arbitrarily large outputs. To combat this, we need to cap the LSTM outputs using a different activation function (one that is bounded, unlike ReLU, which is the default activation used internally by the RNN). Please try the following (modifying the part of your model definition where you define the RNN layers):

- recurrent:
    size: "{{ rnn.size }}"
    sequence: yes
    type: lstm
    outer_activation: hard_sigmoid

Here, we switch from a GRU (the default) to an LSTM, but then we also change the activation function from ReLU to a hard sigmoid (sigmoid because it is bounded, hard because it is efficient). This should solve your problem.