Closed akademi4eg closed 7 years ago
There are a couple different ways to use your model on other data.
Just use the current data format. Have you taken a look at the data format of the current speech examples? It is just a JSONL file next to an audio
directory. Put the audio files in the audio
directory and edit the JSONL file (you can set the text
field to an empty string, and leave duration
at zero if all you are doing is applying the model to unknown data). Then you can set the path
of the speech_recognition
data supplier to point at the directory containing the JSONL file and audio directory.
You can use the Kur Python API and write your own inference script. This isn't very hard to do, and I can probably create an example of this that we can check into the repo.
I am working on a plug-in system for Kur that would make this sort of thing pretty easy. You'll have to stay posted for this one :)
Also, kur build
just tries to assemble/compile the model, but does not do any training/testing/inference. It's mostly useful as a debugging tool to see if your model is going to work.
Thanks for quick and clear response! Example with inference will be a good thing to have.
I think it is simpler than it seems. All you need to do a write an evaluate section that looks like:
evaluate: &evaluate
data:
- speech_recognition:
<<: *data
path: "/path/to/directory/containing/your/custom/audiofilesandjsonl/"
weights: *weights
provider: *provider
destination: speech-new.inference.pkl
hooks:
- transcript
That ^^^ is basically the same as the validate section in the speech example, but edited to point to your custom directory.
To run it (and get inference) then you would just do:
kur evaluate speech-new.yml
Yeah, but this way on each call I'll have to wait while model is loading. I wanted to load it once via Python API and then accept filepath from some stream and return transcription or raw outputs to another stream.
So far I'm at this point:
spec = Kurfile('speech.yml', JinjaEngine())
spec.parse()
result = 'test'
provider = spec.get_provider(result) # probably this shouldn't be neccessary
weights = spec.data[result].get('weights')
model = spec.get_model(provider)
model.restore(initial_weights)
target = spec.get_trainer(with_optimizer=False)
target.compile(result, with_provider=provider)
for batch in provider: # just to get some valid batch
break
inputs = [
coerce_shape(
batch[model.get_data_name_by_layer_name(batch, name)],
shape, name
)
for shape, name in zip(
model.compiled['test']['shapes']['input'],
model.compiled['test']['names']['input']
)
] + [False]
out = model.compiled['test']['func'](input)
def coerce_shape(data, shape, name):
if data.ndim < len(shape):
return numpy.expand_dims(data, -1)
else:
return data
Yet I get KeyError: 'ctc_scaled_utterance_length'
You made a good start, but it's much, much simpler than that :)
Try playing with code like this:
kurfile = Kurfile('Kurfile.yml', JinjaEngine())
kurfile.parse()
model = kurfile.get_model()
model.backend.compile(model)
model.restore('weights.kur') # If necessary
pdf, metrics = model.backend.evaluate(model, data={'input' : numpy.array(...)})
where "input"
in the last line corresponds to the name of the "input" of your model, and the numpy array in that line is an array of samples (it is the batch itself).
Now you can just call evaluate
over and over.
model = kurfile.get_model()
call throws an error kur.containers.parsing_error.ParsingError: Placeholder "utterance" requires a shape.
this sample worked for me:
from kur.kurfile import Kurfile
from kur.engine import JinjaEngine
def coerce_shape(data, shape, name):
if data.ndim < len(shape):
return numpy.expand_dims(data, -1)
else:
return data
spec = Kurfile('speech.yml', JinjaEngine())
spec.parse()
result = 'test'
provider = spec.get_provider(result)
weights = spec.data[result].get('weights')
model = spec.get_model(provider)
model.restore(weights)
target = spec.get_trainer(with_optimizer=False)
target.compile(result, with_provider=provider)
for batch in provider:
break
inputs = [
coerce_shape(
batch[model.get_data_name_by_layer_name(batch, name)],
shape, name
)
for shape, name in zip(
model.compiled['test']['shapes']['input'],
model.compiled['test']['names']['input']
)
] + [False]
outputs = model.compiled['test']['func'](inputs)
I also planned to use unprocessed outputs of model as features for other (non-asr) models. Thus it was important to have a way to run a model when all you have is a plain wav file without text or other supplementary data.
Glad you have it working! Given data, Kur can automatically infer shapes for the model. But when building entirely off the Python API, this information isn't always available. The easiest fix for errors like "Placeholder "utterance" requires a shape." is to explicitly tell the model the shape of your data. For example, change this:
model:
- input: utterance
# ...
to this:
model:
- input:
shape: [null, 161]
name: utterance
# ...
More generally, your shapes can be of the form [x, y, ...]
, where x, y, ... are the shapes. Use null
if one of the dimensions is variable between batches. So for the speech example, we have a variable number of timesteps, but 161 frequency components, so we have shape: [null, 161]
. For 320x240 RGB images, it would be shape: [320, 240, 3]
.
Thanks a lot! It works perfectly now!
Hey akademi4eg, I am trying to do the same thing, where I want to pas sin a .wav file to the model and read the recognised String into a variable. Can you please share your code/solution on how you were able to get this working? Is there a way we can point to the .wav file in the python program as an input instead of a numpy array?
I ended up with the following script (I removed some additional information, like saving data to mat-files. Hope there is no typos). It takes a directory and processed all wav-files in it. Script works, yet it is a quick-and-dirty solution, so you'll need to modify it heavily to use in any non-research context.
from kur.kurfile import Kurfile
from kur.engine import JinjaEngine
from kur.model.hooks import TranscriptHook
from kur.utils import Normalize, get_audio_features
import numpy as np
import sys
import os
import matplotlib.pyplot as plt
from tqdm import tqdm
def main():
model, norm, trans, rev, blank = load()
for root, dirs, files in os.walk(sys.argv[1]):
for filename in tqdm(files):
if not filename.endswith('.wav'):
continue
file_path = os.path.join(root, filename)
if os.path.exists(mat_path):
continue
try:
outputs, feats = get_output(file_path, norm, model)
except OSError:
continue
text = trans.argmax_decode(outputs, rev, blank)
print('====={}:\n{}\n'.format(filename, text))
plot(outputs, rev, blank)
def load():
spec_file = 'speech.yml'
w_file = 'weights'
spec = Kurfile(spec_file, JinjaEngine())
spec.parse()
model = spec.get_model()
model.backend.compile(model)
model.restore(w_file)
norm = Normalize(center=True, scale=True, rotate=True)
norm.restore('norm.yml')
trans = TranscriptHook()
rev = {0: ' ', 1: "'", 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z'}
blank = 28
return model, norm, trans, rev, blank
def plot(outputs, rev, blank, title=''):
fig, ax = plt.subplots()
plt.imshow(outputs.T, aspect='auto')
ax.set_yticks(list(rev.keys()) + [blank])
ax.set_yticklabels(list(rev.values()) + ['null'])
plt.grid(True)
plt.title(title)
plt.show()
def get_output(file_path, norm, model):
feats = get_audio_features(file_path, 'spec', high_freq=8000)
inputs = norm.apply(feats)
pdf, _ = model.backend.evaluate(model, data={'utterance': inputs[np.newaxis, :, :]})
return pdf['asr'].squeeze(), feats
if __name__ == "__main__":
main()
Thanks a lot Tkanov :), Let me give this a try, hopefully I can get this working for our app.
On Fri, Apr 28, 2017 at 6:50 PM, Dmytro Tkanov notifications@github.com wrote:
I ended up with the following script (I removed some additional information, like saving data to mat-files. Hope there is no typos). It takes a directory and processed all wav-files in it. Script works, yet it is a quick-and-dirty solution, so you'll need to modify it heavily to use in any non-research context.
`from kur.kurfile import Kurfile from kur.engine import JinjaEngine from kur.model.hooks import TranscriptHook from kur.utils import Normalize, get_audio_features import numpy as np import sys import os import matplotlib.pyplot as plt from tqdm import tqdm
def main(): model, norm, trans, rev, blank = load()
for root, dirs, files in os.walk(sys.argv[1]): for filename in tqdm(files): if not filename.endswith('.wav'): continue file_path = os.path.join(root, filename) if os.path.exists(mat_path): continue try: outputs, feats = get_output(file_path, norm, model) except OSError: continue text = trans.argmax_decode(outputs, rev, blank) print('====={}:\n{}\n'.format(filename, text)) plot(outputs, rev, blank)
def load(): spec_file = 'speech.yml' w_file = 'weights' spec = Kurfile(spec_file, JinjaEngine()) spec.parse()
model = spec.get_model() model.backend.compile(model) model.restore(w_file)
norm = Normalize(center=True, scale=True, rotate=True) norm.restore('norm.yml')
trans = TranscriptHook() rev = {0: ' ', 1: "'", 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z'} blank = 28 return model, norm, trans, rev, blank
def plot(outputs, rev, blank, title=''): fig, ax = plt.subplots() plt.imshow(outputs.T, aspect='auto') ax.set_yticks(list(rev.keys()) + [blank]) ax.set_yticklabels(list(rev.values()) + ['null']) plt.grid(True) plt.title(title) plt.show()
def get_output(file_path, norm, model): feats = get_audio_features(file_path, 'spec', highfreq=8000) inputs = norm.apply(feats) pdf, = model.backend.evaluate(model, data={'utterance': inputs[np.newaxis, :, :]}) return pdf['asr'].squeeze(), feats
if name == "main": main()`
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepgram/kur/issues/23#issuecomment-297995858, or mute the thread https://github.com/notifications/unsubscribe-auth/AFUwFX80DOSACNqdCyu8gRLAocLF-mJfks5r0ee5gaJpZM4MPiRL .
@sawantilak Did u succeed in incorporating this model in your app? i am trying to do the same for my android app. Do you think its possible?
Nilesh,
Unfortunately I did not get much time to spend on this. I though I would get back to this later, but we happened to try Mozilla Deepspeech which works out better for us. Also the support is a little faster on this framework. If you are giving Mozilla Deepspeech a ry, let me know and I can help out with whatever I already know.
https://hacks.mozilla.org/2017/11/a-journey-to-10-word-error-rate/ https://github.com/mozilla/DeepSpeech
Sawan.
On 22 January 2018 at 13:39, Nilesh Rathi notifications@github.com wrote:
@sawantilak https://github.com/sawantilak Did u succeed in incorporating this model in your app? i am trying to do the same for my android app. Do you think its possible?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepgram/kur/issues/23#issuecomment-359351366, or mute the thread https://github.com/notifications/unsubscribe-auth/AFUwFTuWHW0YZiqTr-8VzsMgOLzIyDmUks5tNEImgaJpZM4MPiRL .
@sawantilak Thnak you. i would let you know.
Let's consider speech example. I've trained a model and want to try it on some wav files I've recorded. Is there any way to run model on specified file? Either command-line call or some python snippet would be highly appreciated. Or maybe there is a way to export model as a py-file and tensorflow model file so it can be used for serving as REST or via rabbitmq.
Also a side question, what "kur build" action does?