Sock is an AI-controlled puppet that you can create your own custom avatar for and that is capable of interacting with you via text or voice. It utilizes OpenAI's Whisper model transcription, ChatGPT chat completion API, and speech synthesis through the browser's Web Speech API or Coqui-AI's TTS. Sock is designed to act as an AI co-host for Twitch streaming, or any application where you want to be able to converse and receive spoken responses from a Language Learning Model (LLM).
Sock operates through a Next.js application running in your web browser, which communicates with a Python backend. This backend is responsible for managing the API calls to OpenAI, as well as running the Whisper transcription and Coqui-AI text-to-speech models.
For those with questions, thoughts, and discussion, we encourage you to use our repo's Issue Tracker and Discussions board. Help that's shared publicly there is valuable for the entire user base!
(Are there issues with this guide? Please let us know by opening an issue!)
Make sure to have your various dependencies installed.
If using Coqui-AI for speech synthesis with GPU support
To install the frontend, navigate to your terminal and execute:
yarn install
To install the backend, navigate to your terminal and execute:
cd backend
python3 -m venv venv
venv\scripts\activate
pip install wheel
pip install -r requirements.txt
ESpeak is necessary for some Text-to-Speech (TTS) models. Here's how to set it up:
Create a .env
file in the backend
directory with the following contents:
OPENAI_API_KEY={your openai api key}
OPENAI_CHAT_MODEL="gpt-3.5-turbo-0301"
TTS_USE_GPU="True"
OPENAI_API_KEY
In order to get an OpenAI API key, you need to:
OPENAI_CHAT_MODEL
Sock permits the usage of different chat models based on your preference. If you wish to use an alternative model, you can replace the default one in the .env
file.
To see a list of compatible models for /v1/chat/completions, refer to the OpenAI's model compatibility documentation.
TTS_USE_GPU
When using Coqui-AI for Sock's text-to-speech (TTS), you can set TTS_USE_GPU="True"
to use NVIDIA CUDA for GPU-based processing. If your system does not support CUDA, you can modify the TTS_USE_GPU
value to "False"
. This will use the CPU for TTS processing, which will result in a slower response time. Alternatively, you could utilize Web Speech for faster execution, albeit with its more robotic voices.
Open two terminal windows.
yarn backend
yarn frontend
Finally, open a browser and navigate to http://localhost:3000.
You're good to go! 🎉
If you get an error like TypeError: argument of type 'NoneType' is not iterable
when you run yarn backend
, you may need to forcibly reinstall whisper. Do the following in your terminal, which should help fix this:
cd backend
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git