not compatible with neon v2?

dimatter commented 7 years ago

(.venv2) dimatter@qb:~/neonSpeech/deepspeech/speech$ python train.py --manifest train:/home/dimatter/neonSpeech/deepspeech/librispeech/training.csv --manifest val:/home/dimatter/neonSpeech/deepspeech/librispeech/validation.csv -e 1  -z 10 -s ../new.prm --model_file ../librispeech_16_epochs.prm
2017-07-02 00:06:36,410 - neon.util.argparser - WARNING - No schedule given for model serialization, using default 1
2017-07-02 00:06:38,040 - neon.backends - WARNING - deterministic_update and deterministic args are deprecated in favor of specifying random seed
10
Traceback (most recent call last):
  File "train.py", line 187, in <module>
    cost=cost, callbacks=callbacks)
  File "/home/dimatter/neonSpeech/neon/neon/models/model.py", line 175, in fit
    self.initialize(dataset, cost)
  File "/home/dimatter/neonSpeech/neon/neon/models/model.py", line 122, in initialize
    prev_input = self.layers.configure(prev_input)
  File "/home/dimatter/neonSpeech/neon/neon/layers/container.py", line 296, in configure
    in_obj = l.configure(in_obj)
  File "/home/dimatter/neonSpeech/neon/neon/layers/layer.py", line 852, in configure
    self.nglayer = self.be.conv_layer(self.be.default_dtype, **self.convparams)
  File "/home/dimatter/neonSpeech/neon/neon/backends/nervanagpu.py", line 1943, in conv_layer
    dil_d, dil_h, dil_w)
  File "/home/dimatter/neonSpeech/neon/neon/backends/layer_gpu.py", line 470, in __init__
    self.fprop_kernels = convolution.FpropDirect(*args)
  File "/home/dimatter/neonSpeech/neon/neon/backends/convolution.py", line 584, in __init__
    dil_d, dil_h, dil_w)
  File "/home/dimatter/neonSpeech/neon/neon/backends/convolution.py", line 361, in __init__
    self.init_smallN(op)
  File "/home/dimatter/neonSpeech/neon/neon/backends/convolution.py", line 426, in init_smallN
    assert N % 4 == 0 or N in (1,2), "N dim must be multiple of 4 or equal to 1 or 2"
AssertionError: N dim must be multiple of 4 or equal to 1 or 2

dimatter commented 7 years ago

1.7.0 crashes like so:

(.venv2) dimatter@qb:~/neonSpeech/deepspeech/speech$ python train.py --manifest train:/home/dimatter/neonSpeech/deepspeech/librispeech/training.csv --manifest val:/home/dimatter/neonSpeech/deepspeech/librispeech/validation.csv -e 1  -z 10 -s ../new.prm --model_file ../librispeech_16_epochs.prm
2017-07-02 00:52:58,059 - neon.util.argparser - WARNING - No schedule given for model serialization, using default 1
2017-07-02 00:52:59,704 - neon.backends - WARNING - deterministic_update and deterministic args are deprecated in favor of specifying random seed
Traceback (most recent call last):
  File "train.py", line 176, in <module>
    nesterov=True)
TypeError: __init__() got an unexpected keyword argument 'nesterov'

1.8 & 1.9 crash with the assertion fail...

dimatter commented 7 years ago

when trying to continue training the pre-trained model on CPU/mkl:

2017-07-02 05:57:39,616 - neon.models.model - WARNING - Problems restoring existing RNG state: algorithm must be 'MT19937'

starting a clean model on mkl:

[src/conv.c:151] err (-127)
[src/conv.c:152] err (-127)
[src/conv.c:153] err (-127)
[src/conv.c:172] err (-1)
[src/conv.c:173] err (-1)
[src/conv.c:174] err (-1)
[src/conv.c:175] err (-1)
[src/conv.c:176] err (-1)
[src/conv.c:177] err (-1)
[src/conv.c:178] err (-1)
[src/conv.c:179] err (-1)
[src/conv.c:180] err (-1)
[src/conv.c:204] err (-1)
wrong input for try_convert!
Segmentation fault (core dumped)

dimatter commented 7 years ago

I figured out that it was my batch size that it was complaining about.

python train.py --manifest train:/home/dimatter/neonSpeech/deepspeech/librispeech/training.csv --manifest val:/home/dimatter/neonSpeech/deepspeech/librispeech/validation.csv -e 1  -z 16 -s ../new.prm --model_file ../librispeech_16_epochs.prm -b gpu
2017-07-02 06:49:00,051 - neon.util.argparser - WARNING - No schedule given for model serialization, using default 1
2017-07-02 06:49:01,669 - neon.backends - WARNING - deterministic_update and deterministic args are deprecated in favor of specifying random seed

and silently crashes.

tyler-nervana commented 7 years ago

The "silent crashing" on GPU is actually caused by the model finishing training. The epoch index is stored with the model file, so in order to do one more epoch of training, you have to use -e 17.

We don't yet have support for Deepspeech2 on MKL, unfortunately.

NervanaSystems / deepspeech

not compatible with neon v2? #39