Closed michaelcapizzi closed 7 years ago
Hi @michaelcapizzi!
data.type=spec
, Kur is creating spectrograms of your audio data. For 16kHz audio and a 10ms timestep in the STFT, you end up with 161 frequency bins. The fact that you have 81
in your error suggests that you're using 8kHz audio.input
layer in the model has a shape
explicitly specified).Do you have one or two files that you can share? Even if you just generated two files, each with 5 seconds of white noise, for example, then I may be able to help debug.
Thank you for the quick reply @ajsyp .
You are correct that I'm using 8kHz data.
So a few follow up questions:
- input: utterance
shape: 81
I got an error:
yaml.scanner.ScannerError: mapping values are not allowed here
in "<unicode string>", line 115, column 12:
shape: 81
I've also attached two audio samples. Thanks again, and any further guidance is greatly appreciated.
You can upsample with ffmpeg. This command converts 8k to 16k (mono, 16bit, 16kHz):
ffmpeg -i thefile_8k.wav -acodec pcm_s16le -ac 1 -ar 16000 thefile_16k.wav
Upsampling will take a little longer to train but it's a good way to mix 8k and 16k audio training data.
Two comments:
Are you sure there isn't a rogue norm.yml
file lying around? The normalization features (stored in that file) might suggest to Kur that the audio is 16kHz. If you just delete/rename that file, it should regenerate a new one for your 8k data.
To specify an explicit shape, do it like this:
- input:
shape: [null, 81]
name: utterance
shape
is a parameter of the input layer, and is therefore indented underneath. The [null, 81]
means "variable length utterances (in the time domain), but 81 frequency features."
That must be it, @ajsyp !
I didn't realize that the norm.yml
file held information like that. It was lying around from when I ran your original example. I've since removed that and the model now properly "infers" dimension of 81
. Thank you.
And thanks for the bit of .yml
syntax help as well.
When trying to use my own data for a speech example, I get this issue very early on:
I looked through the log, and I see that the model inferred an input dimension of
161
. And so it's clear that when it goes to load a batch of data with a different dimension (in this case155
), it fails.So I have two questions:
data.type=spec
)?161
?