deepgram / kur

Descriptive Deep Learning
Apache License 2.0
814 stars 107 forks source link

How to use models on other data #23

Closed akademi4eg closed 7 years ago

akademi4eg commented 7 years ago

Let's consider speech example. I've trained a model and want to try it on some wav files I've recorded. Is there any way to run model on specified file? Either command-line call or some python snippet would be highly appreciated. Or maybe there is a way to export model as a py-file and tensorflow model file so it can be used for serving as REST or via rabbitmq.

Also a side question, what "kur build" action does?

ajsyp commented 7 years ago

There are a couple different ways to use your model on other data.

  1. Just use the current data format. Have you taken a look at the data format of the current speech examples? It is just a JSONL file next to an audio directory. Put the audio files in the audio directory and edit the JSONL file (you can set the text field to an empty string, and leave duration at zero if all you are doing is applying the model to unknown data). Then you can set the path of the speech_recognition data supplier to point at the directory containing the JSONL file and audio directory.

  2. You can use the Kur Python API and write your own inference script. This isn't very hard to do, and I can probably create an example of this that we can check into the repo.

  3. I am working on a plug-in system for Kur that would make this sort of thing pretty easy. You'll have to stay posted for this one :)

ajsyp commented 7 years ago

Also, kur build just tries to assemble/compile the model, but does not do any training/testing/inference. It's mostly useful as a debugging tool to see if your model is going to work.

akademi4eg commented 7 years ago

Thanks for quick and clear response! Example with inference will be a good thing to have.

scottstephenson commented 7 years ago

I think it is simpler than it seems. All you need to do a write an evaluate section that looks like:

evaluate: &evaluate
  data:
    - speech_recognition:
        <<: *data
        path: "/path/to/directory/containing/your/custom/audiofilesandjsonl/"
  weights: *weights
  provider: *provider

  destination: speech-new.inference.pkl

  hooks:
    - transcript

That ^^^ is basically the same as the validate section in the speech example, but edited to point to your custom directory.

To run it (and get inference) then you would just do:

kur evaluate speech-new.yml

akademi4eg commented 7 years ago

Yeah, but this way on each call I'll have to wait while model is loading. I wanted to load it once via Python API and then accept filepath from some stream and return transcription or raw outputs to another stream.

So far I'm at this point:

spec = Kurfile('speech.yml', JinjaEngine())
spec.parse()

result = 'test'
provider = spec.get_provider(result) # probably this shouldn't be neccessary
weights = spec.data[result].get('weights')
model = spec.get_model(provider)
model.restore(initial_weights)
target = spec.get_trainer(with_optimizer=False)
target.compile(result, with_provider=provider)
for batch in provider: # just to get some valid batch
    break

inputs = [
    coerce_shape(
        batch[model.get_data_name_by_layer_name(batch, name)],
        shape, name
    )
    for shape, name in zip(
        model.compiled['test']['shapes']['input'],
        model.compiled['test']['names']['input']
    )
] + [False]

out = model.compiled['test']['func'](input)

def coerce_shape(data, shape, name):
    if data.ndim < len(shape):
        return numpy.expand_dims(data, -1)
    else:
        return data

Yet I get KeyError: 'ctc_scaled_utterance_length'

ajsyp commented 7 years ago

You made a good start, but it's much, much simpler than that :)

Try playing with code like this:

kurfile = Kurfile('Kurfile.yml', JinjaEngine())
kurfile.parse()
model = kurfile.get_model()
model.backend.compile(model)
model.restore('weights.kur') # If necessary
pdf, metrics = model.backend.evaluate(model, data={'input' : numpy.array(...)})

where "input" in the last line corresponds to the name of the "input" of your model, and the numpy array in that line is an array of samples (it is the batch itself).

Now you can just call evaluate over and over.

akademi4eg commented 7 years ago

model = kurfile.get_model() call throws an error kur.containers.parsing_error.ParsingError: Placeholder "utterance" requires a shape.

akademi4eg commented 7 years ago

this sample worked for me:

from kur.kurfile import Kurfile
from kur.engine import JinjaEngine

def coerce_shape(data, shape, name):
    if data.ndim < len(shape):
        return numpy.expand_dims(data, -1)
    else:
        return data

spec = Kurfile('speech.yml', JinjaEngine())
spec.parse()

result = 'test'
provider = spec.get_provider(result)
weights = spec.data[result].get('weights')
model = spec.get_model(provider)
model.restore(weights)
target = spec.get_trainer(with_optimizer=False)
target.compile(result, with_provider=provider)
for batch in provider:
    break

inputs = [
    coerce_shape(
        batch[model.get_data_name_by_layer_name(batch, name)],
        shape, name
    )
    for shape, name in zip(
        model.compiled['test']['shapes']['input'],
        model.compiled['test']['names']['input']
    )
] + [False]

outputs = model.compiled['test']['func'](inputs)
akademi4eg commented 7 years ago

I also planned to use unprocessed outputs of model as features for other (non-asr) models. Thus it was important to have a way to run a model when all you have is a plain wav file without text or other supplementary data.

ajsyp commented 7 years ago

Glad you have it working! Given data, Kur can automatically infer shapes for the model. But when building entirely off the Python API, this information isn't always available. The easiest fix for errors like "Placeholder "utterance" requires a shape." is to explicitly tell the model the shape of your data. For example, change this:

model:
  - input: utterance
  # ...

to this:

model:
  - input:
      shape: [null, 161]
    name: utterance
  # ...

More generally, your shapes can be of the form [x, y, ...], where x, y, ... are the shapes. Use null if one of the dimensions is variable between batches. So for the speech example, we have a variable number of timesteps, but 161 frequency components, so we have shape: [null, 161]. For 320x240 RGB images, it would be shape: [320, 240, 3].

akademi4eg commented 7 years ago

Thanks a lot! It works perfectly now!

sawantilak commented 7 years ago

Hey akademi4eg, I am trying to do the same thing, where I want to pas sin a .wav file to the model and read the recognised String into a variable. Can you please share your code/solution on how you were able to get this working? Is there a way we can point to the .wav file in the python program as an input instead of a numpy array?

akademi4eg commented 7 years ago

I ended up with the following script (I removed some additional information, like saving data to mat-files. Hope there is no typos). It takes a directory and processed all wav-files in it. Script works, yet it is a quick-and-dirty solution, so you'll need to modify it heavily to use in any non-research context.


from kur.kurfile import Kurfile
from kur.engine import JinjaEngine
from kur.model.hooks import TranscriptHook
from kur.utils import Normalize, get_audio_features
import numpy as np
import sys
import os
import matplotlib.pyplot as plt
from tqdm import tqdm

def main():
    model, norm, trans, rev, blank = load()

    for root, dirs, files in os.walk(sys.argv[1]):
        for filename in tqdm(files):
            if not filename.endswith('.wav'):
                continue
            file_path = os.path.join(root, filename)
            if os.path.exists(mat_path):
                continue
            try:
                outputs, feats = get_output(file_path, norm, model)
            except OSError:
                continue
            text = trans.argmax_decode(outputs, rev, blank)
            print('====={}:\n{}\n'.format(filename, text))
            plot(outputs, rev, blank)

def load():
    spec_file = 'speech.yml' 
    w_file = 'weights'
    spec = Kurfile(spec_file, JinjaEngine())
    spec.parse()

    model = spec.get_model()
    model.backend.compile(model)
    model.restore(w_file)

    norm = Normalize(center=True, scale=True, rotate=True)
    norm.restore('norm.yml')

    trans = TranscriptHook()
    rev = {0: ' ', 1: "'", 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z'}
    blank = 28
    return model, norm, trans, rev, blank

def plot(outputs, rev, blank, title=''):
    fig, ax = plt.subplots()
    plt.imshow(outputs.T, aspect='auto')
    ax.set_yticks(list(rev.keys()) + [blank])
    ax.set_yticklabels(list(rev.values()) + ['null'])
    plt.grid(True)
    plt.title(title)
    plt.show()

def get_output(file_path, norm, model):
    feats = get_audio_features(file_path, 'spec', high_freq=8000)
    inputs = norm.apply(feats)
    pdf, _ = model.backend.evaluate(model, data={'utterance': inputs[np.newaxis, :, :]})
    return pdf['asr'].squeeze(), feats

if __name__ == "__main__":
    main()
sawantilak commented 7 years ago

Thanks a lot Tkanov :), Let me give this a try, hopefully I can get this working for our app.

On Fri, Apr 28, 2017 at 6:50 PM, Dmytro Tkanov notifications@github.com wrote:

I ended up with the following script (I removed some additional information, like saving data to mat-files. Hope there is no typos). It takes a directory and processed all wav-files in it. Script works, yet it is a quick-and-dirty solution, so you'll need to modify it heavily to use in any non-research context.

`from kur.kurfile import Kurfile from kur.engine import JinjaEngine from kur.model.hooks import TranscriptHook from kur.utils import Normalize, get_audio_features import numpy as np import sys import os import matplotlib.pyplot as plt from tqdm import tqdm

def main(): model, norm, trans, rev, blank = load()

for root, dirs, files in os.walk(sys.argv[1]): for filename in tqdm(files): if not filename.endswith('.wav'): continue file_path = os.path.join(root, filename) if os.path.exists(mat_path): continue try: outputs, feats = get_output(file_path, norm, model) except OSError: continue text = trans.argmax_decode(outputs, rev, blank) print('====={}:\n{}\n'.format(filename, text)) plot(outputs, rev, blank)

def load(): spec_file = 'speech.yml' w_file = 'weights' spec = Kurfile(spec_file, JinjaEngine()) spec.parse()

model = spec.get_model() model.backend.compile(model) model.restore(w_file)

norm = Normalize(center=True, scale=True, rotate=True) norm.restore('norm.yml')

trans = TranscriptHook() rev = {0: ' ', 1: "'", 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z'} blank = 28 return model, norm, trans, rev, blank

def plot(outputs, rev, blank, title=''): fig, ax = plt.subplots() plt.imshow(outputs.T, aspect='auto') ax.set_yticks(list(rev.keys()) + [blank]) ax.set_yticklabels(list(rev.values()) + ['null']) plt.grid(True) plt.title(title) plt.show()

def get_output(file_path, norm, model): feats = get_audio_features(file_path, 'spec', highfreq=8000) inputs = norm.apply(feats) pdf, = model.backend.evaluate(model, data={'utterance': inputs[np.newaxis, :, :]}) return pdf['asr'].squeeze(), feats

if name == "main": main()`

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/deepgram/kur/issues/23#issuecomment-297995858, or mute the thread https://github.com/notifications/unsubscribe-auth/AFUwFX80DOSACNqdCyu8gRLAocLF-mJfks5r0ee5gaJpZM4MPiRL .

nilesh02 commented 6 years ago

@sawantilak Did u succeed in incorporating this model in your app? i am trying to do the same for my android app. Do you think its possible?

sawantilak commented 6 years ago

Nilesh,

Unfortunately I did not get much time to spend on this. I though I would get back to this later, but we happened to try Mozilla Deepspeech which works out better for us. Also the support is a little faster on this framework. If you are giving Mozilla Deepspeech a ry, let me know and I can help out with whatever I already know.

https://hacks.mozilla.org/2017/11/a-journey-to-10-word-error-rate/ https://github.com/mozilla/DeepSpeech

Sawan.

On 22 January 2018 at 13:39, Nilesh Rathi notifications@github.com wrote:

@sawantilak https://github.com/sawantilak Did u succeed in incorporating this model in your app? i am trying to do the same for my android app. Do you think its possible?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepgram/kur/issues/23#issuecomment-359351366, or mute the thread https://github.com/notifications/unsubscribe-auth/AFUwFTuWHW0YZiqTr-8VzsMgOLzIyDmUks5tNEImgaJpZM4MPiRL .

nilesh02 commented 6 years ago

@sawantilak Thnak you. i would let you know.