Closed hisoyeah closed 5 years ago
Here the links of my labels (train and test) : train : https://framadrop.org/r/x7PT4ZWWDa#qScNABNoCAOPMwcaG66Z2ntcXSHzYPq+V7SOIZRoy3c=
test : https://framadrop.org/r/NrsDfJLonQ#PZ9WcSCgKd354wfmXL+HwX4gruo5yT5aflwNeoeQ4cA=
I am having the same issue with Kur. In particular, I have exported the SCOTUS-speech corpus to kur format, and get this error when I try to train the standard speech.yml file
here is a preview of my corpus dir:
$ ls scotusspeech-wav/audio/
0002dc2e-fb42-4055-8716-30884da750bf.wav 4e0c359b-bda0-4ac5-8ba9-2cb191026871.wav ae446f59-a989-4159-8fae-725834c90007.wav
05c966f8-cf4d-4b35-878e-e9318950d0f3.wav 505d045c-20b5-41dd-8658-b1beed223688.wav ...
$ head scotusspeech-wav/scotusspeech-test.jsonl
{"uuid": "0e4f2165-19e3-4488-98c3-54c6c9d6e77a", "duration_s": 6.52, "text": "we will hear argument first this morning in case 105400 tapia v"}
{"uuid": "384c69c7-bebf-479c-a447-8f123419e633", "duration_s": 2.28, "text": "united states mr cahn"}
{"uuid": "ad923318-f48a-41bd-8a6d-8169c2f5c7c8", "duration_s": 22.92, "text": "mr chief justice and may it please the court when it instructed courts to recognize that imprisonment is not an appropriate means of promoting correction and rehabilitation congress intended to end the practice of sending defendants to prison so that they might get treatment"}
{"uuid": "13cb75be-4f23-472f-a264-ef3cd582eda8", "duration_s": 9.76, "text": "the commands of 3582 are clear on this point do not imprison and do not lengthen prison sentences for the purposes of rehabilitation"}
{"uuid": "fe815d5d-52a7-4b6b-8cab-baf73f717a8b", "duration_s": 4.52, "text": "this plain meaning is confirmed by the structure of the statute"}
{"uuid": "e5e8594d-1589-49e4-a86d-6e1e97040822", "duration_s": 6.36, "text": "under the statute judges have the power to sentence defendants to prison but not to prison programs"}
{"uuid": "35ae41ef-b799-4f48-afbe-75d2017b4a67", "duration_s": 7.6, "text": "judges once had that power under the youth corrections act and under the narcotic addicts rehabilitation act"}
{"uuid": "43fc133f-1eab-4beb-9c9a-2bba41c39481", "duration_s": 3.64, "text": "with the sentencing reform act congress took that power away"}
{"uuid": "4d663259-4098-4195-bbdf-099f1df3d1c2", "duration_s": 9.08, "text": "that structure makes sense only because congress intended that defendants should no longer be sent to prison for purposes of rehabilitation"}
{"uuid": "562baf9b-67c4-4d42-9651-2831026f049d", "duration_s": 2.72, "text": "you have in effect a oneway ratchet"}
Noahs-MacBook:speechrec noaj$ soxi scotusspeech-wav/audio/0e4f2165-19e3-4488-98c3-54c6c9d6e77a.wav
Input File : 'scotusspeech-wav/audio/0e4f2165-19e3-4488-98c3-54c6c9d6e77a.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:06.52 = 104320 samples ~ 489 CDDA sectors
File Size : 209k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
here is my entire Kurfile:
---
###############################################################################
# Pro-tips:
#
# - Use YAML's anchors! This let's you define a value (even a dictionary), and
# reuse, or even change, parts of it later. YAML anchors are incredible for
# defining constant values that you want to reuse all over the place. You can
# define an anchor like this:
# KEY: &my_anchor
# Note that the value of KEY can be anything. All of these are allowed:
# KEY: &my_anchor "my value"
# KEY: &my_ancor
# A: B
# C: D
# KEY: &my_anchor [1, 2, 3]
# You can then refer back to your anchors like this:
# ANOTHER_KEY: *my_anchor
# This sets the value of ANOTHER_KEY to be the same thing as the original
# KEY. Now let's say that your anchor is a dictionary, but you want to refer
# to it with modified values later. Try this:
# KEY: &my_anchor
# FIRST: VALUE_1
# SECOND: VALUE_2
# ANOTHER_KEY:
# <<: *my_anchor
# SECOND: VALUE_2_NEW
# THIRD: VALUE_3
# MORE_KEY: *my_anchor
# These are 100% equivalent to this more verbose structure:
# KEY:
# FIRST: VALUE_1
# SECOND: VALUE_2
# ANOTHER_KEY:
# FIRST: VALUE_1
# SECOND: VALUE_2_NEW
# THIRD: VALUE_3
# MORE_KEY:
# FIRST: VALUE_1
# SECOND: VALUE_2
#
# - Use the Jinja2 engine! It is really powerful, and it's most appropriately
# used to do on-the-fly interpretation/evaluation of values in the "model"
# section of the Kurfile.
#
# - So how do you know when to use YAML anchors as opposed to Jinja2
# expressions? Here are some tips.
#
# YAML anchors only work within a single YAML file, and are evaluated the
# moment the file is loaded. This means you can't use YAML anchors from a
# JSON Kurfile, and you can't reference anchors in other Kurfiles.
#
# Jinja2 is interpreted after all Kurfiles are loaded, which means that
# many different Kurfiles can share variables via Jinja2. Jinja2
# expressions can also be used in JSON Kurfiles.
#
# It's almost like YAML anchors are "compile-time constants" but Jinja2
# expressions are interpreted at run-time. As a result, the value of a
# Jinja2 expression could be different at different points in the
# Kurfile (e.g., if you use Jinja2 to reference the previous layer in a
# model, obviously the interpretation/value of "previous layer" resolves
# to something different for the second layer in the model as compared to the
# fifth layer in the model.
###############################################################################
settings:
# Deep learning model
cnn:
kernels: 1000
size: 11
stride: 2
rnn:
size: 1000
depth: 3
vocab:
# Need for CTC
size: 28
# Setting up the backend.
backend:
name: keras
backend: tensorflow
# Batch sizes
provider: &provider
batch_size: 16
force_batch_size: yes
# Where to put the data.
data: &data
#path: "~/projects/speechrec/lsdc-test/"
path: "../../scotusspeech-wav/"
type: spec
max_duration: 50
max_frequency: 8000
normalization: norm.yml
# Where to put the weights
weights: &weights weights
###############################################################################
model:
# This is Baidu's DeepSpeech model:
# https://arxiv.org/abs/1412.5567
# Kur makes prototyping different versions of it incredibly easy.
# The model input is audio data (called utterances).
- input: utterance
# One-dimensional, variable-size convolutional layers to extract more
# efficient representation of the data.
- convolution:
kernels: "{{ cnn.kernels }}"
size: "{{ cnn.size }}"
strides: "{{ cnn.stride }}"
border: valid
- activation: relu
- batch_normalization
# A series of recurrent layers to learn temporal sequences.
- for:
range: "{{ rnn.depth }}"
iterate:
- recurrent:
size: "{{ rnn.size }}"
sequence: yes
- batch_normalization
# A dense layer to get everything into the right output shape.
- parallel:
apply:
- dense: "{{ vocab.size + 1 }}"
- activation: softmax
# The output is the transcription.
- output: asr
###############################################################################
train:
data:
# A "speech_recognition" data supplier will create these data sources:
# utterance, utterance_length, transcript, transcript_length, duration
- speech_recognition:
<<: *data
# url: "https://kur.deepgram.com/data/lsdc-train.tar.gz"
# checksum: >-
# fc414bccf4de3964f895eaa9d0e245ea28810a94be3079b55505cf0eb1644f94
weights: *weights
provider:
<<: *provider
sortagrad: duration
log: log
optimizer:
name: sgd
nesterov: yes
learning_rate: 2e-4
momentum: 0.9
clip:
norm: 100
###############################################################################
validate: &validate
data:
- speech_recognition:
<<: *data
# path: "~/projects/speechrec/"
# url: "https://kur.deepgram.com/data/lsdc-test.tar.gz"
# checksum: >-
# e1c8cf9cd57e8c1ae952b6e4e40dcb5c8e3932c81ecd52c090e4a05c8ebbea2b
weights: *weights
provider: *provider
hooks:
- transcript
###############################################################################
test: *validate
###############################################################################
evaluate:
<<: *validate
provider:
<<: *provider
force_batch_size: no
###############################################################################
loss:
- name: ctc
# The model's output (its best-guest transcript).
target: asr
# How long the corresponding audio utterance is.
input_length: utterance_length
relative_to: utterance
# How long the ground-truth transcript is.
output_length: transcript_length
# The ground-truth transcipt itself.
output: transcript
...
here is the output I get:
[WARNING 2019-01-06 14:22:03,840 kur.supplier.speechrec:465] Inferring vocabulary from data set.
[WARNING 2019-01-06 14:22:31,127 kur.supplier.speechrec:465] Inferring vocabulary from data set.
2019-01-06 14:24:09.074572: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at ctc_loss_op.cc:168 : Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 29 labels:
[ERROR 2019-01-06 14:24:11,046 kur.model.executor:352] Exception raised during training.
Traceback (most recent call last):
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/model/executor.py", line 349, in train
**kwargs
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/model/executor.py", line 784, in wrapped_train
self.compile('train', with_provider=provider)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/model/executor.py", line 117, in compile
**kwargs
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/backend/keras_backend.py", line 693, in compile
self.wait_for_compile(model, key)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/backend/keras_backend.py", line 723, in wait_for_compile
self.run_batch(model, batch, key, False)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/backend/keras_backend.py", line 766, in run_batch
outputs = compiled['func'](inputs)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 29 labels:
[[{{node CTCLoss}} = CTCLoss[_class=["loc:@gradients/CTCLoss_grad/mul"], ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Log, ToInt64, GatherNd, Squeeze_1)]]
Traceback (most recent call last):
File "/Users/noaj/.virtualenvs/kur/bin/kur", line 11, in <module>
sys.exit(main())
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/__main__.py", line 494, in main
sys.exit(args.func(args) or 0)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/__main__.py", line 65, in train
func(step=args.step)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/kurfile.py", line 434, in func
return trainer.train(**defaults)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/model/executor.py", line 349, in train
**kwargs
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/model/executor.py", line 784, in wrapped_train
self.compile('train', with_provider=provider)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/model/executor.py", line 117, in compile
**kwargs
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/backend/keras_backend.py", line 693, in compile
self.wait_for_compile(model, key)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/backend/keras_backend.py", line 723, in wait_for_compile
self.run_batch(model, batch, key, False)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/kur/backend/keras_backend.py", line 766, in run_batch
outputs = compiled['func'](inputs)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/Users/noaj/.virtualenvs/kur/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 29 labels:
[[{{node CTCLoss}} = CTCLoss[_class=["loc:@gradients/CTCLoss_grad/mul"], ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Log, ToInt64, GatherNd, Squeeze_1)]]
If I change the settings.data.path to path: "../../lsdc-test/"
and train, everything works fine:
$ kur train speech.yml
[WARNING 2019-01-06 14:29:24,288 kur.supplier.speechrec:465] Inferring vocabulary from data set.
[WARNING 2019-01-06 14:29:52,396 kur.supplier.speechrec:465] Inferring vocabulary from data set.
Epoch 1/inf, loss=270.119: 12%|██████████▍ | 32/271 [00:35<04:23, 1.10s/samples]
looking at the corpus that does work (lsdc-test), I see no difference to my own:
Noahs-MacBook:speechrec noaj$ ls lsdc-test/
audio lsdc-test.jsonl
Noahs-MacBook:speechrec noaj$ ls lsdc-test/audio/ | head -n10
013af52d-321a-44b3-a649-0930abb41f4a.wav
01d0aed1-264a-423a-912c-ef6471b7d16d.wav
0226bf2d-0cbb-4e9e-9e29-8b92c2dd9d85.wav
0248f382-153f-4844-b82b-5af6f872b4ee.wav
029dff52-c8e9-42b6-8783-cd1bffd77249.wav
05ffd75b-d2bd-46b3-a32c-463ced7147d4.wav
095109db-4f10-4988-8287-1a22c79dbdd5.wav
097ef85f-d377-4d37-b60d-f0e5bf7c666c.wav
0b611f3f-c6b1-48f0-85cc-a506a7d10022.wav
0b6fa32d-375c-45bb-9f87-a76d7af09190.wav
Noahs-MacBook:speechrec noaj$ head lsdc-test/lsdc-test.jsonl
{"text": "the place seemed fragrant with all the riches of greek thought and song since the days when ptolemy philadelphus walked there with euclid and theocritus callimachus and lycophron", "duration_s": 11.72, "uuid": "e6d892b1-20f3-4cb9-ba62-f8d60302d78a"}
{"text": "the room had neither carpet nor fireplace and the only movables in it were a sofa bed a table and an arm chair all of such delicate and graceful forms as may be seen on ancient vases of a far earlier period than that whereof we write", "duration_s": 16.915, "uuid": "1540c191-da5d-4f60-8120-de5dc0277218"}
{"text": "but most probably had any of us entered that room that morning we should not have been able to spare a look either for the furniture or the general effect or the museum gardens or the sparkling mediterranean beyond but we should have agreed that the room was quite rich enough for human eyes for the sake of one treasure which it possessed and beside which nothing was worth a moment's glance", "duration_s": 24.395, "uuid": "99ca045e-fcd1-43b5-aa8e-f6f2903e5f9d"}
{"text": "she has lifted her eyes off her manuscript she is looking out with kindling countenance over the gardens of the museum her ripe curling greek lips such as we never see now even among her own wives and sisters open", "duration_s": 14.475, "uuid": "73624397-ca1c-420d-9b38-3538190e735e"}
{"text": "if they have ceased to guide nations they have not ceased to speak to their own elect", "duration_s": 5.63, "uuid": "40a83e85-93b2-4f82-800d-1a8cda921837"}
{"text": "if they have cast off the vulgar herd they have not cast off hypatia", "duration_s": 5.21, "uuid": "10fa7792-80be-4d44-9296-974553de5bdf"}
{"text": "to be welcomed into the celestial ranks of the heroic to rise to the immortal gods to the ineffable powers onward upward ever through ages and through eternities till i find my home at last and vanish in the glory of the nameless and the absolute one", "duration_s": 18.345, "uuid": "45dc87b1-4608-475c-b5b4-66c732364d13"}
{"text": "i to believe against the authority of porphyry himself too in evil eyes and magic", "duration_s": 5.97, "uuid": "e9a5ef19-fa45-40a3-a69c-53ddf7c69b1d"}
{"text": "what do i care for food", "duration_s": 2.155, "uuid": "108ad3a0-8acf-456e-831e-f51784ffa0fe"}
{"text": "how can he whose sphere lies above the stars stoop every moment to earth", "duration_s": 5.415, "uuid": "bc27f8b2-170e-4b99-96cb-d0c2f3c93b91"}
Noahs-MacBook:speechrec noaj$ soxi lsdc/audio/013af52d-321a-44b3-a649-0930abb41f4a.wav
soxi FAIL formats: can't open input file `lsdc/audio/013af52d-321a-44b3-a649-0930abb41f4a.wav': No such file or directory
Noahs-MacBook:speechrec noaj$ soxi lsdc-test/audio/013af52d-321a-44b3-a649-0930abb41f4a.wav
Input File : 'lsdc-test/audio/013af52d-321a-44b3-a649-0930abb41f4a.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:03.41 = 54560 samples ~ 255.75 CDDA sectors
File Size : 109k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
This is probably a vocabulary error. The vocab should be lowercase a-z, space, and asterisk. If your vocab contains any characters that are not in that bunch but you supply the same vocab file as lsdc-test, then you’ll have a bad time.
Make your dataset conform to the a-z, space, apostrophe, or delete the vocab file and run train again to have kur generate it for you.
Thanks @scottstephenson this completely solved my problem! For now I will filter out numbers from my kur export but longer term I'm curious how I might modify the kurfile to work with more (~ 39) characters.
You can use numbers and all that. It’s not a problem. It might be harder to train though. Just delete the vocab file and let kur generate it and see if it works out well. It’ll do fine, but maybe not better than omitting.
There is no vocab file, this is character level output
There is a character vocabulary, look for a vocab.json. The json object [“a”, ”b”, ”c”, ... , ”z”, “ “, “‘“] is the standard vocab, it’s all characters. You can include capitals or “1”, “2”, etc, and “?”, “!” and others. The vocab will be auto generated if you delete the vocab file and feed in a dataset that includes different characters.
Hello,
I'm trying to train on my custom data but I got a weird issue.
2018-08-30 17:25:46.106572: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at ctc_loss_op.cc:166 : Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 29 labels: [ERROR 2018-08-30 17:25:46,434 kur.model.executor:352] Exception raised during training.
You will find attached my file with labels. My speech.yml is the same as the one in examples.
Thank a lot for helping