DevXT-LLC / ezlocalai

ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.
MIT License
72 stars 13 forks source link

Add Local Voice Functionality #12

Closed Josh-XT closed 8 months ago

Josh-XT commented 8 months ago

Added Cloning Text to Speech

Usage example:

import requests

voices = requests.post(
    "http://localhost:8091/v1/audio/generation",
    json={
        "text": "I'm sorry Dave, I'm afraid I can't do that.",
        "voice": "default",
        "language": "en",
    },
)
voice_response = voices.json()
print(f"{voice_response}")

Added Speech to Text functionality

Usage example:

import requests

transcription = requests.post(
    "http://localhost:8091/v1/audio/transcriptions",
    json={
        "file": voice_response["data"],
        "audio_format": "wav",
        "model": "base.en",
    },
)
print(transcription.json())

Updates to chat completions and completions endpoints

For the completions and chat completions endpoints, we use extra_body for additional parameters.

Usage example:

import openai

openai.base_url = "http://localhost:8091/v1/"
openai.api_key = "Your LOCAL_LLM_API_KEY from your .env file"

completion = openai.completions.create(
    model="phi-2-dpo",
    prompt=voice_response["data"],
    temperature=0.3,
    max_tokens=1024,
    top_p=0.90,
    n=1,
    stream=False,
    extra_body={"system_message": "You are a creative assistant.", "audio_format": "wav", "voice": "default"},
)
print(completion.choices[0].text)
# Base64 audio that you can save to a wav file to play, or play through other means.
audio_response = completion.choices[0]["audio"]