Picovoice / eagle

On-device speaker recognition engine powered by deep learning
Apache License 2.0
29 stars 3 forks source link
speaker-embedding speaker-identification speaker-recognition

Eagle

GitHub release GitHub

Maven Central npm npm CocoaPods PyPI

Made in Vancouver, Canada by Picovoice

Twitter URL YouTube Channel Views

Eagle is an on-device speaker recognition engine. Eagle is:

Table of Contents

Overview

Eagle consists of two distinct steps: Enrollment and Recognition. In the enrollment step, Eagle analyzes a series of utterances from a particular speaker to learn their unique voiceprint. This step results in a Profile, which can be stored and utilized in the next step. During the Recognition step, Eagle registers speakers using the Profiles generated in the enrollment phase. Then, Eagle compares the incoming frames of audio to the voiceprints of all enrolled speakers in real-time to determine the similarity between them.

AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including Eagle. Anyone who is using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet connectivity to validate your AccessKey with Picovoice license servers even though the speaker recognition is running 100% offline.

AccessKey also verifies that your usage is within the limits of your account. Everyone who signs up for Picovoice Console receives the Free Tier usage rights described here. If you wish to increase your limits, you can purchase a subscription plan.

Demos

Python Demos

Install the demo package:

pip3 install pveagledemo

Speaker Enrollment

Create a new speaker profile:

eagle_demo_mic enroll --access_key ${ACCESS_KEY} --output_profile_path ${OUTPUT_PROFILE_PATH}

or

eagle_demo_file enroll \
    --access_key ${ACCESS_KEY} \
    --enroll_audio_paths ${ENROLL_AUDIO_PATHS}
    --output_profile_path ${OUTPUT_PROFILE_PATH}

Speaker Recognition

Test the speaker recognition engine:

eagle_demo_mic test \
    --access_key ${ACCESS_KEY} \
    --input_profile_paths ${INPUT_PROFILE_PATH}

or

eagle_demo_file test \
    --access_key ${ACCESS_KEY} \
    --input_profile_paths ${INPUT_PROFILE_PATH}
    --test_audio_paths ${TEST_AUDIO_PATHS}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

For more information about Python demos go to demo/python.

Android Demo

Using Android Studio, open demo/android/EagleDemo as an Android project and then run the application.

Open the file MainActivity.java and replace "${YOUR_ACCESS_KEY_HERE}" in with your AccessKey.

iOS Demo

To run the demo, go to demo/ios/EagleDemo and run:

pod install

Replace let accessKey = "${YOUR_ACCESS_KEY_HERE}" in the file ViewModel.swift with your AccessKey.

Then, using Xcode, open the generated EagleDemo.xcworkspace and run the application.

C Demos

Build the demo:

cmake -S demo/c/ -B demo/c/build && cmake --build demo/c/build --target eagle_demo_mic

To list the available audio input devices:

./demo/c/build/eagle_demo_mic -s

Speaker Enrollment

To enroll a new speaker:

./demo/c/build/eagle_demo_mic -l ${LIBRARY_PATH} -m ${MODEL_PATH} -a ${ACCESS_KEY} -e ${OUTPUT_PROFILE_PATH}

Speaker Recognition

To test the speaker recognition engine:

./demo/c/build/eagle_demo_mic -l ${LIBRARY_PATH} -m ${MODEL_PATH} -a ${ACCESS_KEY} -i ${INPUT_PROFILE_PATH}

Replace ${LIBRARY_PATH} with path to appropriate library available under lib, ${MODEL_PATH} with path to the model file available under lib/common, ${ACCESS_KEY} with AccessKey obtained from Picovoice Console. ${OUTPUT_PROFILE_PATH} in the enrollment step is the path to the generated speaker profile. ${INPUT_PROFILE_PATH} in the recognition step is the path to the generated speaker profile to be tested.

For more information about C demos go to demo/c.

Web Demo

From demo/web run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:5000 in your browser to try the demo.

Node.js Demos

Install the demo package:

npm install -g @picovoice/eagle-node-demo

Speaker Enrollment

Create a new speaker profile:

eagle-mic-demo --enroll \
    --access_key ${ACCESS_KEY} \
    --output_profile_path ${OUTPUT_PROFILE_PATH}

or

eagle-file-demo --enroll \
    --access_key ${ACCESS_KEY} \
    --enroll_audio_paths ${ENROLL_AUDIO_PATH_1 ...} \
    --output_profile_path ${OUTPUT_PROFILE_PATH}

Speaker Recognition

Test the speaker recognition engine:

eagle-mic-demo --test \
    --access_key ${ACCESS_KEY} \
    --input_profile_paths ${INPUT_PROFILE_PATH_1 ...}

or

eagle-file-demo --test \
    --access_key ${ACCESS_KEY} \
    --test_audio_path ${TEST_AUDIO_PATH} \
    --input_profile_paths ${INPUT_PROFILE_PATH_1 ...}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

For more information about Node.js demos go to demo/nodejs.

SDKs

Python

Install the Python SDK:

pip3 install pveagle

Speaker Enrollment

Create an instance of the profiler:

import pveagle

# AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)
access_key = "${ACCESS_KEY}"
eagle_profiler = pveagle.create_profiler(access_key)

Create a new speaker profile:

def get_next_enroll_audio_data():
    pass

percentage = 0.0
while percentage < 100.0:
    percentage, error = eagle_profiler.enroll(get_next_enroll_audio_data())

Export the speaker profile once enrollment is complete:

speaker_profile = eagle_profiler.export()

Release the resources acquired by the profiler:

eagle_profiler.delete()

Speaker Recognition

Create an instance of the engine using the speaker profile exported before:

eagle = pveagle.create_recognizer(access_key, speaker_profile)

Process incoming audio frames:

def get_next_audio_frame():
    pass

while True:
    score = eagle.process(get_next_audio_frame())

Finally, when done be sure to explicitly release the resources:

eagle.delete()

Android

To include the package in your Android project, ensure you have included mavenCentral() in your top-level build.gradle file and then add the following to your app's build.gradle:

dependencies {
    implementation 'ai.picovoice:eagle-android:${LATEST_VERSION}'
}

Speaker Enrollment

Create an instance of the profiler:

import ai.picovoice.eagle.*;

final String accessKey = "${ACCESS_KEY}";

try {
    EagleProfiler eagleProfiler = new EagleProfiler.Builder()
            .setAccessKey(accessKey)
            .build();
} catch (EagleException e) { }

Create a new speaker profile:

public short[] getNextEnrollAudioData() {
    // get audio data
}

EagleProfilerEnrollResult result = null;
try {
    while (result != null && result.getPercentage() < 100.0) {
        result = eagleProfiler.enroll(getNextEnrollAudioData());
    }
} catch (EagleException e) { }

Export the speaker profile once enrollment is complete:

try {
    EagleProfile speakerProfile = eagleProfiler.export();
} catch (EagleException e) { }

Release the resources acquired by the profiler:

eagleProfiler.delete();

Speaker Recognition

Create an instance of the engine using the speaker profile exported before:

import ai.picovoice.eagle.*;

final String accessKey = "${ACCESS_KEY}";

try {
    Eagle eagle = new Eagle.Builder()
        .setAccessKey(accessKey)
        .setSpeakerProfile(speakerProfile)
        .build();
} catch (EagleException e) { }

Process incoming audio frames:

public short[] getNextAudioFrame() {
    // get audio frame
}

try {
    while (true) {
        float[] scores = eagle.process(getNextAudioFrame());
    }
} catch (EagleException e) { }

Finally, when done be sure to explicitly release the resources:

eagle.delete()

iOS

The Eagle iOS binding is available via CocoaPods. To import it into your iOS project, add the following line to your Podfile and run pod install:

pod 'Eagle-iOS'

Speaker Enrollment

Create an instance of the profiler:

import pveagle

let accessKey : String = // .. AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)
let eagleProfiler = try EagleProfiler(accessKey: accessKey)

Create a new speaker profile:

func get_next_enroll_audio_data(numSamples: Int) -> [Int16] {
    // ...
}

do {
    let numSamples = eagleProfiler.minEnrollSamples()

    var percentage = 0.0
    var feedback: EagleProfilerEnrollFeedback?

    while (percentage < 100.0) {
        (percentage, feedback) = try eagleProfiler.enroll(pcm: get_next_enroll_audio_data(numSamples: numSamples))
    }
} catch { }

Export the speaker profile once enrollment is complete:

let speakerProfile = try eagleProfiler.export()

Release the resources acquired by the profiler:

eagleProfiler.delete()

Speaker Recognition

Create an instance of the engine using the speaker profile exported before:

let eagle = Eagle(accessKey: accessKey, speakerProfiles: [speakerProfile])

Process incoming audio frames:

func get_next_audio_frame() -> [Int16] {
    // ...
}

do {
    let profileScores = try eagle.process(pcm: get_next_audio_frame())
} catch { }

Finally, when done be sure to explicitly release the resources:

eagle.delete()

C

include/pv_eagle.h header file contains relevant information.

Speaker Enrollment

Build an instance of the profiler:

const char *access_key = "${ACCESS_KEY}";
const char *model_path = "${MODEL_PATH}";

pv_eagle_profiler_t *eagle_profiler = NULL;
pv_status_t status = pv_eagle_profiler_init(
            access_key,
            model_path,
            &eagle_profiler);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Replace ${ACCESS_KEY} with the AccessKey obtained from Picovoice Console, and ${MODEL_PATH} with the path to the model file available under lib/common.

Use eagle_profiler to create a new speaker profile:

extern const int16_t *get_next_enroll_audio_frame(void);
extern const int32_t get_next_enroll_audio_num_samples(void);

float enroll_percentage = 0.0f;
pv_eagle_profiler_enroll_feedback_t feedback = PV_EAGLE_PROFILER_ENROLLMENT_ERROR_AUDIO_OK;

while (enroll_percentage < 100.0f) {
  status = pv_eagle_profiler_enroll(
          eagle_profiler,
          get_next_enroll_audio_frame(),
          get_next_enroll_audio_num_samples(),
          &feedback,
          &enroll_percentage);
  if (status != PV_STATUS_SUCCESS) {
      // error handling logic
  }
}

int32_t profile_size_bytes = 0;
status = pv_eagle_profiler_export_size(eagle_profiler, &profile_size_bytes);
void *speaker_profile = malloc(profile_size_bytes);
status = pv_eagle_profiler_export(
        eagle_profiler,
        speaker_profile);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Once the speaker profile is exported, the resources acquired by the profiler can be released:

pv_eagle_profiler_delete(eagle_profiler);

Speaker Recognition

Create an instance of the engine using the speaker profile exported before:

pv_eagle_t *eagle = NULL;
pv_status_t status = pv_eagle_init(
        access_key,
        model_path,
        1,
        (const void *const *) &speaker_profile,
        &eagle);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Now the eagle can be used to process incoming audio frames:

extern const int16_t *get_next_audio_frame(void);
const int32_t frame_length = pv_eagle_frame_length();

float score = 0.f;
while (true) {
    const int16_t *pcm = get_next_audio_frame();
    const pv_status_t status = pv_eagle_process(eagle, pcm, &score);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }
}

Finally, when done be sure to release the acquired resources:

pv_eagle_delete(handle);

Web

Install the Eagle package with yarn (or npm):

yarn add @picovoice/eagle-web

Speaker Enrollment

Create an instance of the EagleProfiler:

const eagleModel = {
  publicPath: ${MODEL_RELATIVE_PATH},
  // or
  base64: ${MODEL_BASE64_STRING},
}

const eagleProfiler = await EagleProfiler.create(
        ${ACCESS_KEY},
        eagleModel);

Replace ${ACCESS_KEY} with the AccessKey obtained from Picovoice Console, and the model options with the path to the model file available under lib/common or a base64 string of it.

Use EagleProfiler to create a new speaker profile:

function getAudioData(numSamples): Int16Array {
  // get audio frame of size `numSamples`
}

let percentage = 0;
while (percentage < 100) {
  const audioData = getAudioData(eagleProfiler.minEnrollSamples);

  const result: EagleProfilerEnrollResult = await eagleProfiler.enroll(audioData);
  if (result.feedback === EagleProfilerEnrollFeedback.AUDIO_OK) {
      // audio is good!
  } else {
      // feedback code will tell you why audio was not used in enrollment
  }
  percentage = result.percentage;
}

// export speaker profile
const speakerProfile: EagleProfile = eagleProfiler.export();

Speaker Recognition

Create an instance of the engine with one or more speaker profiles created by the EagleProfiler:

const eagle = await Eagle.create(
        ${ACCESS_KEY},
        eagleModel,
        speakerProfile);

Process audio frames and get speaker scores (i.e. likelihood they are speaking) in real-time:

function getAudioData(numSamples): Int16Array {
  // get audio frame of size `numSamples`
}

while (true) {
  const audioData = getAudioData(eagle.frameLength);
  const scores: number[] = await eagle.process(audioData);
}

Node.js

Install Node.js SDK:

yarn add @picovoice/eagle-node

Speaker Enrollment

Create an instance of the profiler:

const { EagleProfiler } = require("@picovoice/eagle-node");

const accessKey = "${ACCESS_KEY}"; // Obtained from the Picovoice Console (https://console.picovoice.ai/)
const eagleProfiler = new EagleProfiler(accessKey);

Create a new speaker profile:

const { EnrollProgress } = require("@picovoice/eagle-node");

function getAudioData(numSamples): Int16Array {
  // get audio frame of size `numSamples`
}

let percentage = 0;
while (percentage < 100) {
  const audioData = getAudioData(eagleProfiler.minEnrollSamples);

  const result: EnrollProgress = await eagleProfiler.enroll(audioData);
  if (result.feedback === EagleProfilerEnrollFeedback.NONE) {
      // audio is good!
  } else {
      // feedback code will tell you why audio was not used in enrollment
  }
  percentage = result.percentage;
}

Export the speaker profile once enrollment is complete:

const speakerProfile: Uint8Array = eagleProfiler.export();

Release the resources acquired by the profiler:

eagleProfiler.release();

Speaker Recognition

Create an instance of the engine using the speaker profile exported before:

const { Eagle } = require("@picovoice/eagle-node");

const accessKey = "${ACCESS_KEY}"; // Obtained from the Picovoice Console (https://console.picovoice.ai/)
const eagle = new Eagle(accessKey, speakerProfile);

Process incoming audio frames:

function getAudioData(numSamples): Int16Array {
  // get audio frame of size `numSamples`
}

while (true) {
  const audioData = getAudioData(eagle.frameLength);
  const scores: number[] = eagle.process(audioData);
}

Finally, when done be sure to explicitly release the resources:

eagle.release()

Releases

v1.0.0 - January 24th, 2024

v0.2.0 - November 24th, 2023

v0.1.0 - May 29, 2023

FAQ

You can find the FAQ here.