JarikDem-Bot / ai-waifu

AI VTuber Waifu and voice assistant
MIT License
44 stars 5 forks source link
ai-vtuber anime chatbot virtual-girlfriend voice-assistant vtuber waifu

AI Waifu (VTuber)

![GitHub](https://img.shields.io/github/license/JarikDem-Bot/ai-waifu-test) ![GitHub top language](https://img.shields.io/github/languages/top/JarikDem-Bot/ai-waifu-test) ![Static Badge](https://img.shields.io/badge/Anime%20-%20AI%20waifu%20-%20lightpink)

Anime AI Waifu is an AI powered voice assistant with VTuber's model, that combines the charm of anime characters with cutting-edge technologies. This project is meant to create an engaging experience where you can interact with desired character in real-time without powerful hardware.

Features

Showcase

Video demonstration

*Demonstration in real time without cutouts or speed up. This is real delay in answers.

Installation

To run this project, you need:

  1. Install Python 3.10.5 if you don't already have it installed.

  2. Clone the repository by running git clone https://github.com/JarikDem-Bot/ai-waifu.git

  3. Install the required Python packages by running pip install -r requirements.txt in the project directory.

  4. Create .env file inside the project directory and enter your API keys

    .env template ```shell OPENAI_API_KEY='YOUR_OPEN_AI_KEY' ELEVENLABS_API_KEY='YOUR_ELEVENLABS_KEY' ```
  5. Install VB-Cable

  6. Install and set VTube Studio

    Settings: - Select `CABLE Output` as microphone. Select `Preview microphone audio` to hear waifu's answers - Select input and output for `Mouth Open`. Optionally you can set "breathing" to get idle movents.
  7. Select your required settings in main.py in waifu.initialize

    Arguments: - `user_input_service` (str) - the way to interact with Waifu - `"whisper"` - OpenAI's whisper speech to text service; paid, requires OpanAi API key. - `"google"` - free google speech to text service. - `"console"` - type your promt in console with text (absoulutely free). - `None` or unspecified - default value is `"whisper"`. - `stt_duration` (float) - the maximum number of seconds that it will dynamically adjust the threshold for before returning. This value should be at least 0.5 in order to get a representative sample of the ambient noise. Default value is `0.5`. - `mic_index` (int) - index of the device to use for audio input. If `None` or unspecified will use default microphone. - `chatbot_service` (str) - service that will generate responses - `"openai"` - OpenAI text generation servise; paid, requires OpanAi API key. - `"test"` - returns prewritten message; used as dummy text for developement to reduce time and cost of testings. - `None` or unspecified - default value is `"openai"`. - `chatbot_model` (str) - model used for text generation. List of available models you can find [here](https://platform.openai.com/docs/models/overview). Default value is `"gpt-3.5-turbo"`. - `chatbot_temperature` (float) - determines creativity of the generated text. A higher value leads to more creative result. A lower value leads to less creative and more similar results. Default value is `0.5`. - `personality_file` (str) - relative path to txt file with waifu's description. Default value is `"personality.txt"`. - `tts_service` (str) - service that "reads" Waifu's responses - `"google"` - free Google's tts, voice feels very "robotic". - `"elevenlabs"` - ElevenLabs tts with good quality; paid, requires ElevenLabs API key. - `"console"` - output will be printed in console (free). - `None` or unspecified - default value is `"google"`. - `output_device` - (int) output device ID or (str) output device name substring. If VB-Cable is used, you need to find device, that will start with `CABLE Input (VB-Audio Virtual` using `sd.query_devices()` command. - `tts_voice` (str) - ElevenLabs voice name. Default value is `"Elli"`. - `tts_model` (str) - ElevenLabs model. Recommended values are `"eleven_monolingual_v1"` and `"eleven_multilingual_v1"`. Default value is `"eleven_monolingual_v1"`.
  8. Run the project by executing python main.py in the project directory.


Warning
Depending on the selected input mode, program may send all recorded sounds or other data to the 3-rd parties such as: Google (stt, tts), OpenAI (stt, text generation), ElevenLabs (tts).

License

MIT