PantelisDeveloping / openspeech-tts

Text-to-Speech API compatible with OpenAI's API Endpoint
https://github.com/PantelisDeveloping/openspeech-tts
3 stars 0 forks source link

Connect Dify to OpenSpeech #1

Closed taowang1993 closed 2 weeks ago

taowang1993 commented 2 weeks ago

Hi. I modified the api.py file to convert api key and port to env variables.

from flask import Flask, request, send_file, jsonify
from gevent.pywsgi import WSGIServer
from functools import wraps
from art import tprint
import edge_tts
import asyncio
import tempfile
import os

app = Flask(__name__)

# Use environment variables for API_KEY and PORT
API_KEY = os.environ.get('API_KEY')
PORT = int(os.environ.get('PORT', 5000))

tprint("OPEN SPEECH")
print(f"OpenSource TTS API Compatible with OpenAI API")
print(f" ")
print(f"   ---------------------------------------------------------------- ")
print(f" * Serving OpenVoice API")
print(f" * Server running on http://localhost:{PORT}")
print(f" * Voice Endpoint Generated: http://localhost:{PORT}/v1/audio/speech")
print(f" ")
print("Press CTRL+C to quit")

def require_api_key(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        auth_header = request.headers.get('Authorization')
        if not auth_header or not auth_header.startswith('Bearer '):
            return jsonify({"error": "Missing or invalid API key"}), 401
        token = auth_header.split('Bearer ')[1]
        if token != API_KEY:
            return jsonify({"error": "Invalid API key"}), 401
        return f(*args, **kwargs)
    return decorated_function

@app.route('/v1/audio/speech', methods=['POST'])
@require_api_key
def text_to_speech():
    data = request.json
    if not data or 'input' not in data:
        return jsonify({"error": "Missing 'input' in request body"}), 400

    text = data['input']
    model = data.get('model', 'tts-1') # We will ignore this input
    voice = data.get('voice', 'en-US-AriaNeural')

    # Map OpenAI voices to edge-tts voices (this is a simple mapping, you might want to expand it)
    voice_mapping = {
        'alloy': 'en-US-AriaNeural',
        'echo': 'en-US-GuyNeural',
        'fable': 'en-GB-SoniaNeural',
        'onyx': 'en-US-ChristopherNeural',
        'nova': 'en-AU-NatashaNeural',
        'shimmer': 'en-US-JennyNeural'
    }

    edge_tts_voice = voice_mapping.get(voice, voice)

    output_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")

    async def generate_speech():
        communicate = edge_tts.Communicate(text, edge_tts_voice)
        await communicate.save(output_file.name)

    asyncio.run(generate_speech())

    return send_file(output_file.name, mimetype="audio/mpeg", as_attachment=True, download_name="speech.mp3")

@app.route('/v1/models', methods=['GET'])
@require_api_key
def list_models():
    # For simplicity, we're returning a fixed list of "models"
    models = [
        {"id": "tts-1", "name": "Text-to-speech v1"},
        {"id": "tts-1-hd", "name": "Text-to-speech v1 HD"}
    ]
    return jsonify({"data": models})

@app.route('/v1/voices', methods=['GET'])
@require_api_key
def list_voices():
    voices = edge_tts.list_voices()
    # Transform the voice data to match OpenAI's format
    formatted_voices = [{"name": v['ShortName'], "language": v['Locale']} for v in voices]
    return jsonify({"voices": formatted_voices})

if __name__ == '__main__':
    if not API_KEY:
        print("Warning: API_KEY environment variable is not set.")
    http_server = WSGIServer(('0.0.0.0', PORT), app)
    http_server.serve_forever()

Dockerfile

# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define environment variables
ENV PYTHONUNBUFFERED=1
ENV PORT=5000
# Note: API_KEY should be set when running the container

# Run api.py when the container launches
CMD ["python", "main/api.py"]

I am not sure if my impementation is correct.

I am hosting openspeech on Huggingface Spaces.

Then, I could not connect Dify to openspeech.

I tried the following url

https://taowang1993-openspeech.hf.space
https://taowang1993-openspeech.hf.space/v1
https://taowang1993-openspeech.hf.space/v1/audio/speech
image
PantelisDeveloping commented 2 weeks ago

Hello dear taowang1993, thank you for your support and providing the edited script with the enviroment variables implementation. It should be useful for people who want to setup their .env file. As for the dockerfile, i havent tried it. As for the Dify, kindly note this repo is designed to folllow the API to be compatible with OpenAI's API. Dify have their own API implementation guide :

https://docs.dify.ai/guides/extension/api-based-extension Additionaly as per the "Local debugging" paragraph, you may need to port your application live instead of the localhost, for which you could use ngrok to make your local endpoint available anywhere.

I have made some changes in the branch feature/dify-support, i have uploaded dify.py and a sample curl command, kindly check it out and let me know if it works for you now,

Best regards, Pantelis