dscripka / openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
Apache License 2.0
544 stars 47 forks source link

How was the testing done for Alexa I have alexa.onnx ? #134

Open sanjuktasr opened 4 months ago

sanjuktasr commented 4 months ago

I got 7/328 audios detected using the alexa_v0.1.onnx model. 328 audios are from https://github.com/Picovoice/wake-word-benchmark/tree/master/audio/alexa

eugene-orlov-sm commented 3 months ago

It is strange. I had much better results for test keywords when using just voice

sanjuktasr commented 3 months ago

def test_models(clips):

Load model with defaults

owwModel = openwakeword.Model(wakeword_models=[
    os.path.join("/NAS1/sanjukta_repo_falcon2/wakeword/openwakeword/openwakeword", "resources", "models", "alexa_v0.1.onnx")
    ], inference_framework="onnx")

# Get clips for each model (assumes that test clips will have the model name in the filename)
test_dict = {}

all_clips = [str(i) for i in Path(os.path.join("/NAS1/sanjukta_repo_falcon2/wakeword","wake-word-benchmark/audio/alexa/")).glob("*.wav")]

test_dict['alexa'] = [i for i in all_clips if 'alexa' in i]

c=0

# Predict
for model, clips in test_dict.items():
    for clip in tqdm(all_clips):
        # print(clip)
        # Get predictions for reach frame in the clip
        predictions = owwModel.predict_clip(clip)
        owwModel.reset()  # reset after each clip to ensure independent results

        # print(predictions)

        # Make predictions dictionary flatter
        predictions_flat = collections.defaultdict(list)
        [predictions_flat[key].append(i[key]) for i in predictions for key in i.keys()]

        # print(clip," : ",max(predictions_flat['alexa_v0.1']))
        # print(predictions)

        if(max(predictions_flat['alexa_v0.1'])>0.5):
                print(clip," : ",max(predictions_flat['alexa_v0.1']))
                c=c+1

    print(c)

@eugene-orlov-sm this is the test function I used. dataset consists of 328 audios.

dscripka commented 3 months ago

@sanjuktasr, see this notebook for an example of how the testing of the alexa model was done.

You should be getting many more than 7 activations from the Picovoice benchmark dataset, there may be an issue with the audio files (make sure they are 16-bit PCM, 16khz wav files).