🎙️🤖Create, Customize and Talk to your AI Character/Companion in Realtime (All in One Codebase!). Have a natural seamless conversation with AI everywhere (mobile, web and terminal) using LLM OpenAI GPT3.5/4, Anthropic Claude2, Chroma Vector DB, Whisper Speech2Text, ElevenLabs Text2Speech🎙️🤖
Added whisperX. Reduced Speech-to-Text time cost from ~0.5s to ~0.13s.
Added diarization feature enabled by whisperX. So transcriptions can now come with speaker ids.
Used torchaudio instead of pydub to load audio streams. Reduced transcode time cost from 95ms to 9ms. So combined the transcription process is about 0.2s with whisperX. Previously ~0.6s with faster-whisper.
Added warm-up run before first sentence. Avoid the 2~4 sec overhead for the first round. 24/7 servers don't care about the first round, though.
To Do
Support pre-tokens and suppress tokens.
Test in non-web environments.
Think about the diarization API. Also persistent diarization throughout the conversation.
Test / optimize its VRAM usage. Especially during diarization.
Updates
torchaudio
instead ofpydub
to load audio streams. Reduced transcode time cost from 95ms to 9ms. So combined the transcription process is about 0.2s with whisperX. Previously ~0.6s with faster-whisper.To Do