RaSan147 / VoiceAI-Asuna

If you're familiar with the anime Sword art online, you know it! This project is a virtual Assistant for multiple OS
https://ai-asuna.onrender.com
Apache License 2.0
27 stars 5 forks source link

feature request:please add voice in/out #4

Open develperbayman opened 1 year ago

develperbayman commented 1 year ago

title covers it would be sweet to talk and reply in audio

RaSan147 commented 1 year ago

The web ui is under development, try this branch if you want voice https://github.com/RaSan147/VoiceAI-Asuna/tree/kivy-gui

I'm trying hard to replicate cute voice, but online services require payment or api and OS based voice is kinda feels off. So trying to learn alternative ways now.

And as for the sweet talk part, i just jumped of from python based kivy to server based (self made) web ui. So there are tons of ground work needs to be done before adding more commands, so things getting a bit slow

Sorry

RaSan147 commented 1 year ago

@develperbayman if you know any website that let users produce voice wav files via api (free) that would be a life saving help 🛐🙇‍♂️

RaSan147 commented 1 year ago

@develperbayman could you check if there's any voice in https://www.voicerss.org/api/demo.aspx you like (goes well with the character)?? I may be able to handle pitch and speed to mimic some expression

develperbayman commented 1 year ago

wow i totally did not realize you replied its prob a little late (my apologies i havent been very active for a bit) but perhaps you would be more interested in a tts engine and a stt engine to accomplish this i am using one for python for my AI script im trying to do take a peek

develperbayman commented 1 year ago

import` threading import time import sys import chat_commands from gtts import gTTS import os import tkinter as tk from tkinter import filedialog, messagebox import speech_recognition as sr import webbrowser import re import subprocess import openai

doListenToCommand = True listening = False

List with common departures to end the while loop

despedida = ["Goodbye", "goodbye", "bye", "Bye", "See you later", "see you later"]

Create the GUI window

window = tk.Tk() window.title("Computer: AI") window.geometry("400x400")

Create the text entry box

text_entry = tk.Entry(window, width=50) text_entry.pack(side=tk.BOTTOM)

Create the submit button

submit_button = tk.Button(window, text="Submit", command=lambda: submit()) submit_button.pack(side=tk.BOTTOM)

Create the text output box

text_output = tk.Text(window, height=300, width=300) text_output.pack(side=tk.BOTTOM)

Set your OpenAI API key here

openai.api_key = "your_api_key_here"

def submit(event=None, text_input=None): global doListenToCommand global listening

# Get the user input and check if the input matches the list of goodbyes
if text_input is not None and text_input != "":
    usuario = text_input
else:
    usuario = text_entry.get()

if usuario in despedida:
    on_closing()
else:
    prompt = f"You are ChatGPT and answer my following message: {usuario}"

# Getting responses using the OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=2049
)

respuesta = response["choices"][0]["text"]

# Converting text to audio
texto = str(respuesta)
tts = gTTS(texto, lang='en', tld='ie')
tts.save("audio.mp3")

# Displaying the answer on the screen
text_output.insert(tk.END, "ChatGPT: " + respuesta + "\n")

# Clear the input text
text_entry.delete(0, tk.END)

# Playing the audio
doListenToCommand = False
time.sleep(1)
os.system("play audio.mp3")
doListenToCommand = True

# Call function to listen to the user
if not listening:
    listen_to_command()

Bind the Enter key to the submit function

window.bind("", submit)

def load_core_principles(file_path): with open(file_path, 'r') as file: principles = file.readlines() return principles

def listen_to_command(): global doListenToCommand global listening

# If we are not to be listening then exit the function.
if not doListenToCommand:
    return

# Initialize the recognizer
r = sr.Recognizer()

# Use the default microphone as the audio source
with sr.Microphone() as source:
    print("Listening...")
    listening = True
    audio = r.listen(source)
    listening = False

try:
    # Use speech recognition to convert speech to text
    command = r.recognize_google(audio)
    print("You said:", command)
    text_output.insert(tk.END, "You: " + command + "\n")
    text_entry.delete(0, tk.END)

    # Process the commands
    # Prepare object to be passed.
    class PassedCommands:
        tk = tk
        text_output = text_output
        submit = submit

    chat_commands.process_commands(PassedCommands, command)

except sr.UnknownValueError:
    print("Speech recognition could not understand audio.")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service:", str(e))

listening = False
listen_to_command()

def on_closing(): if messagebox.askokcancel("Quit", "Do you want to quit?"): window.destroy()

window.protocol("WM_DELETE_WINDOW", on_closing)

if name == "main":

Create the menu bar

menu_bar = tk.Menu(window)

# Create the "File" menu
file_menu = tk.Menu(menu_bar, tearoff=0)
file_menu.add_command(label="Open LLM", command=lambda: filedialog.askopenfilename())
file_menu.add_command(label="Save LLM", command=lambda: filedialog.asksaveasfilename())
file_menu.add_separator()
file_menu.add_command(label="Exit", command=on_closing)
menu_bar.add_cascade(label="File", menu=file_menu)

# Create the "Run" menu
run_menu = tk.Menu(menu_bar, tearoff=0)
run_menu.add_command(label="Run as normal app", command=lambda: threading.Thread(target=run_as_normal_app).start())
run_menu.add_command(label="Run on Flask", command=lambda: threading.Thread(target=run_on_flask).start())
menu_bar.add_cascade(label="Run", menu=run_menu)

# Set the menu bar
window.config(menu=menu_bar)

# Start the main program loop
start_listening_thread = threading.Thread(target=listen_to_command)
start_listening_thread.daemon = True
start_listening_thread.start()
window.mainloop() 
develperbayman commented 1 year ago

i hate markup it never works for me but yeah it generates the mp3 automatically this example uses openai actually this script is complete however it uses another python script to supply any extra commands

develperbayman commented 1 year ago

import subprocess import webbrowser import re import validators import sys

def process_commands(passed_commands, command): if "computer" in command.lower(): print("Activated Command: Computer") passed_commands.text_output.insert( passed_commands.tk.END, "Activated Command: Computer" + "\n") passed_commands.submit(text_input=command)

listen_to_command()

    # Open a website
    #if command.lower().startswith("open website"):
    if "open website" in command.lower():
        # Extract the website URL from the command
        #url = command.replace("open website", "")
        url = command.partition("open website")
        # access third tuple element
        url = url[2]
        url = url.strip() # Strip whitespace on both ends. Not working? As there is a space in the leading part of the URL variable after this.
        # Test for http:// or https:// and add http:// to the URL if missing.
        if not url.startswith("http://") and not url.startswith("https://"):
            url = "http://" + url

        print("Trying to open website: " + url)

        # Validating if the URL is correct
        if validators.url(url):
            webbrowser.open(url, new=0, autoraise=True)

            passed_commands.text_output.insert(
                passed_commands.tk.END, "Opening website: " + url + "\n")
        else:
            print("Invalid URL command. URL: " + url)
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Invalid URL command. URL: " + url + "\n")

    return

def process_commands(passed_commands, command): if "computer" in command.lower(): print("Activated Command: Computer") passed_commands.text_output.insert( passed_commands.tk.END, "Activated Command: Computer" + "\n") passed_commands.submit(text_input=command)

listen_to_command()

    # Open an application
    if "run program" in command.lower():
        # Extract the application name from the command
        app_name = command.partition("run program")[2]
        app_name = app_name.strip()

        print("Trying to open program: " + app_name)

        try:
            subprocess.Popen(app_name)
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Opening program: " + app_name + "\n")
        except FileNotFoundError:
            print("Program not found: " + app_name)
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Program not found: " + app_name + "\n")

        return

    print("Invalid command")
    passed_commands.text_output.insert(
        passed_commands.tk.END, "Invalid command" + "\n")

# Testing
# Stop listening to the microphone
if command.lower() == "stop listening":
    passed_commands.text_output.insert(
        passed_commands.tk.END, "Stopping the microphone." + "\n")
    # What goes here?

    return

# Testing
# Allow program exit via voice.
if command.lower() == "stop program":
    passed_commands.text_output.insert(
        passed_commands.tk.END, "Stopping the program." + "\n")

    sys.exit()

    return
develperbayman commented 1 year ago

again sorry for the very late reply but this should get you started please let me know if it helps or if you do anything cool with it

develperbayman commented 1 year ago

next im working on a huggingface transformers version to self host your own model but dear god the hardware needed for that is insane

RaSan147 commented 1 year ago

next im working on a huggingface transformers version to self host your own model but dear god the hardware needed for that is insane

thats why i dropped all the hopes of running AI just for TTS I'll use edge_tts for speech output (half way done) and for voice recog, this will run on client side, so your openAI solution is no help here. I'll use JS speech recog to voice2text. (need to start working)

EDGE_TTS has some real good collection of voice, thank you microsoft

develperbayman commented 1 year ago

Maybe I'll switch to edge I'm very interested in better sounding voice output