deepgram / kur

Descriptive Deep Learning
Apache License 2.0
816 stars 107 forks source link

Training on TIMIT #12

Closed kentsommer closed 7 years ago

kentsommer commented 7 years ago

Hey guys,

Super new to kur (like literally looked at it for the first time today), so perhaps I'm missing something simple but here is my issue:

I've created a conversion script for the TIMIT dataset to get the dataset to match what kur is expecting: https://github.com/kentsommer/TIMIT-to-Kur/blob/master/to_kur_dataset.py

The jsonl file with the labels and everything which (along with the audio folder) is returned by running the above script on the TIMIT dataset can be found here: https://github.com/kentsommer/TIMIT-to-Kur/releases/download/v0.1/timit_train.jsonl

However, after trying to run $ kur train speech.yml, I get the following (note training on the standard lsc100 works fine):

Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 316M/316M [01:11<00:00, 4.41Mbytes/s]
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 115M/115M [00:29<00:00, 3.87Mbytes/s]
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 1 num_classes: 29 labels: 26,25,0,13,14,24,0,23,10,24,20,17,26,25,14,20,19,0,13,6,23,9,17
[ERROR 2017-02-21 19:53:11,741 kur.model.executor:224] Exception raised during training.
Traceback (most recent call last):
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 1 num_classes: 29 labels: 26,25,0,13,14,24,0,23,10,24,20,17,26,25,14,20,19,0,13,6,23,9,17
     [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](Log/_401, ToInt64/_403, GatherNd, Squeeze_2/_405)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/model/executor.py", line 221, in train
    **kwargs
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/model/executor.py", line 537, in wrapped_train
    self.compile('train', with_provider=provider)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/model/executor.py", line 107, in compile
    **kwargs
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/backend/keras_backend.py", line 639, in compile
    self.wait_for_compile(model, key)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/backend/keras_backend.py", line 668, in wait_for_compile
    self.run_batch(model, batch, key)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/backend/keras_backend.py", line 708, in run_batch
    outputs = compiled['func'](inputs)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/keras/backend/tensorflow_backend.py", line 1943, in __call__
    feed_dict=feed_dict)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 1 num_classes: 29 labels: 26,25,0,13,14,24,0,23,10,24,20,17,26,25,14,20,19,0,13,6,23,9,17
     [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](Log/_401, ToInt64/_403, GatherNd, Squeeze_2/_405)]]

Caused by op 'CTCLoss', defined at:
  File "/home/kent/.virtualenvs/kur/bin/kur", line 11, in <module>
    load_entry_point('kur==0.3.0', 'console_scripts', 'kur')()
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/__main__.py", line 382, in main
    sys.exit(args.func(args) or 0)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/__main__.py", line 62, in train
    func(step=args.step)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/kurfile.py", line 371, in func
    return trainer.train(**defaults)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/model/executor.py", line 221, in train
    **kwargs
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/model/executor.py", line 537, in wrapped_train
    self.compile('train', with_provider=provider)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/model/executor.py", line 107, in compile
    **kwargs
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/backend/keras_backend.py", line 581, in compile
    self.process_loss(model, loss)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/backend/keras_backend.py", line 500, in process_loss
    self.find_compiled_layer_by_name(model, target)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/kur/loss/ctc.py", line 232, in get_loss
    transcript_length
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/keras/backend/tensorflow_backend.py", line 3042, in ctc_batch_cost
    sequence_length=input_length), 1)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/ops/ctc_ops.py", line 145, in ctc_loss
    ctc_merge_repeated=ctc_merge_repeated)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 164, in _ctc_loss
    name=name)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/kent/.virtualenvs/kur/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Saw a non-null label (index >= num_classes - 1) following a null label, batch: 1 num_classes: 29 labels: 26,25,0,13,14,24,0,23,10,24,20,17,26,25,14,20,19,0,13,6,23,9,17
     [[Node: CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](Log/_401, ToInt64/_403, GatherNd, Squeeze_2/_405)]]

Any ideas?

kentsommer commented 7 years ago

Got it working!

After adding a few extra punctuation strips to the conversion script it is training and validating perfectly :+1:

Note to anyone that wants to train on the TIMIT dataset:

Cheers :beer:

ajsyp commented 7 years ago

Awesome! Thanks for sharing your work with us. We need to add better error messages related to the vocabulary.

akademi4eg commented 7 years ago

@kentsommer Could you share what results you've got on TIMIT? Like final validation loss and maybe a couple of predictions? TIMIT is rather small (5.4 hours with only 3.14 hours in train set), so I wonder how well it would generalize.

kentsommer commented 7 years ago

@akademi4eg

I can do a full train if you would like and post the results. I ended up stopping it since the validation loss stayed pretty high while the training loss dropped like a rock indicating some pretty horrible overfitting. I'm fairly certain the TIMIT dataset is simply much too small for the DS model.

akademi4eg commented 7 years ago

@kentsommer Yeah, I think you are right, TIMIT is too small. No need to do a full train. :)