Add Local Voice Functionality

Added Cloning Text to Speech

You can take any 5-10 second (or longer) audio clip of someone's voice to clone it. Save this as a wav file and put it in the voices folder to use it.

Usage example:

import requests

voices = requests.post(
    "http://localhost:8091/v1/audio/generation",
    json={
        "text": "I'm sorry Dave, I'm afraid I can't do that.",
        "voice": "default",
        "language": "en",
    },
)
voice_response = voices.json()
print(f"{voice_response}")

Added Speech to Text functionality

Adds speech-to-text using Whisper for easy access with an API endpoint

Usage example:

import requests

transcription = requests.post(
    "http://localhost:8091/v1/audio/transcriptions",
    json={
        "file": voice_response["data"],
        "audio_format": "wav",
        "model": "base.en",
    },
)
print(transcription.json())

Updates to chat completions and completions endpoints

For the completions and chat completions endpoints, we use extra_body for additional parameters.

If audio_format is present, the prompt will be transcribed to text. It is assumed it is base64 encoded audio in the audio_format specified. Accepted formats currently are wav, m4a, webm, and potentially others that ffmpeg can convert.
If system_message is present, it will be used as the system message for the completion.
If voice is present, the completion will be converted to audio using the specified voice.
- If not streaming, the audio will be returned in the response in the audio beside the text or content keys.
- If streaming, the audio will be streamed in the response in audio/wav format.

Usage example:

import openai

openai.base_url = "http://localhost:8091/v1/"
openai.api_key = "Your LOCAL_LLM_API_KEY from your .env file"

completion = openai.completions.create(
    model="phi-2-dpo",
    prompt=voice_response["data"],
    temperature=0.3,
    max_tokens=1024,
    top_p=0.90,
    n=1,
    stream=False,
    extra_body={"system_message": "You are a creative assistant.", "audio_format": "wav", "voice": "default"},
)
print(completion.choices[0].text)
# Base64 audio that you can save to a wav file to play, or play through other means.
audio_response = completion.choices[0]["audio"]

DevXT-LLC / ezlocalai

Add Local Voice Functionality #12

Added Cloning Text to Speech

Added Speech to Text functionality

Updates to chat completions and completions endpoints