dimension mismatch with BLSTM duration model

CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.

http://www.cstr.ed.ac.uk/projects/merlin/

Apache License 2.0

1.31k stars 441 forks source link

dimension mismatch with BLSTM duration model #245

Closed khusainovaidar closed 6 years ago

khusainovaidar commented 7 years ago

I've downloaded last Merlin's commits and now have a problem with duration model training. When I set architecture like this: hidden_layer_size : [1024, 1024, 1024, 1024, 384] hidden_layer_type : ['TANH', 'TANH', 'TANH', 'TANH', 'BLSTM']

it gives me exception of dimension mismatch: ValueError: dimension mismatch in args to gemm (10950,384)x(768,1)->(10950,1) Apply node that caused the error: GpuDot22(GpuReshape{2}.0, W) Toposort index: 628 Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)] Inputs shapes: [(10950, 384), (768, 1)] Inputs strides: [(384, 1), (1, 0)] Inputs values: ['not shown', 'not shown'] Outputs clients: [[GpuReshape{3}(GpuDot22.0, MakeVector{dtype='int64'}.0)]]

When I switch to LSTM instead of BLSTM - everything works good. Could you please help me with this issue, maybe I need to change something in config files?

khusainovaidar commented 7 years ago

Is there a chance that it is a bug in some of the last updates? If no and someone has already built BLSTM models using the latest commit, can you help with configuration files, lab or anything else.

I’ve tested again with previous commits and everything works. I’ve tried latest version and arcric demo with BLSTM - got similar exception.

ronanki commented 7 years ago

Seems there's some issue with BLSTM after batch training update -- we'll fix that soon. Meanwhile, you can fall back to the batch size of 1, but the training is going to be much longer. Alternatively, you can use Keras backend by enabling the option switch_to_keras

khusainovaidar commented 7 years ago

I’ll switch to keras, thank you for your advice!

khusainovaidar commented 7 years ago

For some reasons keras training end up with much worse results. So switched back and used batch size 1, actually haven't even noticed longer processing time.

BCredeagle commented 7 years ago

Use old version of train_DNN may tempory solve your issue, see the attachment if you need train_DNN_old_version.txt

ronanki commented 7 years ago

@BCredeagle If you set batch size to 1, it automatically uses old code. But, thank you for reminding this bug -- we'll fix it soon.

Howliang commented 6 years ago

Hi all, I recently got the same error message as @khusainovaidar with the latest commits version. "batch size = 1" works for me but the training goes for really slow. I am wondering if this is a remaining bug or I need to switch to Keras or Tensorflow instead of Theano for training LSTM-based RNN model.

theabc123 commented 6 years ago

Hi,

Is there any news concerning BLSTM, I tried with batch size to 1 and it is not working either ? With LSTM it workd fine.

Thanks

gillesdegottex commented 6 years ago

Until the PR is merged, you can replace src/gating.py:855 with: self.output = T.concatenate([fwd.output, bwd.output[::-1]], axis=-1) (note the '-' sign at the end)

That should make it.

(For the story: with batches the shapes are now: (batch_size, length, features) whereas it was (length, features). The concatenation axis has to be the dimension 2 now, whereas it was the dimension 1 previously. Using the last dimension (-1) do the trick, for both cases. )