ai-bot-pro / achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local and remote to run
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

feat: add daily room audio stream and processors #30

Closed weedge closed 1 month ago

weedge commented 1 month ago

feat:

  1. ui:

    • add ui/web-client-ui gitsubmodule and demo daily room
  2. processors (use async pipeline: https://github.com/weedge/pipeline-py ):

    • add types/frames
    • add vad_analyzer: DailyWebRTCVADAnalyzer and SileroVADAnalyzer
    • add daily processors: daily_input_transport_processor, daily_output_transport_processor, audio_input_processor, audio_camera_output_processor
  3. daily room audio stream:

    • add demo daily chat bot pyaudio echo
    • add demo.daily.chat_bot_daily_echo and join agent: chat_bot_daily_echo and chat_bot_pyaudio_echo
    • add list tag for EngineFactory to find
    • add test/modules/speech/audio_stream/test_stream.py
    • add daily room in out audio stream to chat at fe
    • add test_record, test_vad+record; test_stream_player with daily room in/out audio stream
    • change audio stream get_stream_info return obj and add recorder open
    • add TestAudioInOutStream test case and audio stream input output case:
      1. pyaudio_in_stream -> pyaudio_out_stream
      2. pyaudio_in_stream -> daily_room_audio_out_stream
      3. daily_room_audio_in_stream -> pyaudio_out_stream
      4. daily_room_audio_in_stream -> daily_room_audio_out_stream
        
        # 1. pyaudio_in_stream -> pyaudio_out_stream
        AUDIO_IN_STREAM_TAG=pyaudio_in_stream \
        AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \
        TTS_TAG=tts_16k_speaker \
        python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream
        AUDIO_IN_STREAM_TAG=pyaudio_in_stream \
        IS_CALLBACK=1 \
        AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \
        TTS_TAG=tts_16k_speaker \
        python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream

2. pyaudio_in_stream -> daily_room_audio_out_stream

AUDIO_IN_STREAM_TAG=pyaudio_in_stream \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ TTS_TAG=tts_16k_speaker \ python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream AUDIO_IN_STREAM_TAG=pyaudio_in_stream \ IS_CALLBACK=1 \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ TTS_TAG=tts_16k_speaker \ python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream

3. daily_room_audio_in_stream -> pyaudio_out_stream

AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \ TTS_TAG=tts_16k_speaker \ python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ IS_CALLBACK=1 \ AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \ TTS_TAG=tts_16k_speaker \ python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream

4. daily_room_audio_in_stream -> daily_room_audio_out_stream

AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ TTS_TAG=tts_16k_speaker \ python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ IS_CALLBACK=1 \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ TTS_TAG=tts_16k_speaker \ python -m unittest test.modules.speech.audio_stream.test_stream.TestAudioInOutStream


4. env to yaml config for in out audio stream:

CONF_ENV=local \ AUDIO_IN_STREAM_TAG=pyaudio_in_stream \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ python -m src.cmd.init -o env2yaml CONF_ENV=local \ AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \ python -m src.cmd.init -o env2yaml CONF_ENV=local \ AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ python -m src.cmd.init -o env2yaml

------
local-terminal-chat.generate_audio2audio 

TQDM_DISABLE=True \ AUDIO_IN_STREAM_TAG=pyaudio_in_stream \ AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \ VAD_DETECTOR_TAG=webrtc_silero_vad \ RECORDER_TAG=vad_recorder \ ASR_TAG=sense_voice_asr \ ASR_LANG=zn \ ASR_MODEL_NAME_OR_PATH=./models/FunAudioLLM/SenseVoiceSmall \ LLM_TAG=llm_personalai_proxy \ API_URL=https://personal-ai-ts.weedge.workers.dev/ \ LLM_MODEL_NAME=llama-3.1-70b-versatile \ CHAT_TYPE=chat_with_functions \ TTS_TAG=tts_edge \ python -m src.cmd.local-terminal-chat.generate_audio2audio > ./log/std_out.log

TQDM_DISABLE=True \ AUDIO_IN_STREAM_TAG=pyaudio_in_stream \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ VAD_DETECTOR_TAG=webrtc_silero_vad \ RECORDER_TAG=vad_recorder \ ASR_TAG=sense_voice_asr \ ASR_LANG=zn \ ASR_MODEL_NAME_OR_PATH=./models/FunAudioLLM/SenseVoiceSmall \ LLM_TAG=llm_personalai_proxy \ API_URL=https://personal-ai-ts.weedge.workers.dev/ \ LLM_MODEL_NAME=llama-3.1-70b-versatile \ CHAT_TYPE=chat_with_functions \ TTS_TAG=tts_edge \ python -m src.cmd.local-terminal-chat.generate_audio2audio > ./log/std_out.log

TQDM_DISABLE=True \ AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ AUDIO_OUT_STREAM_TAG=pyaudio_out_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ VAD_DETECTOR_TAG=webrtc_silero_vad \ RECORDER_TAG=vad_recorder \ ASR_TAG=sense_voice_asr \ ASR_LANG=zn \ ASR_MODEL_NAME_OR_PATH=./models/FunAudioLLM/SenseVoiceSmall \ LLM_TAG=llm_personalai_proxy \ API_URL=https://personal-ai-ts.weedge.workers.dev/ \ LLM_MODEL_NAME=llama-3.1-70b-versatile \ CHAT_TYPE=chat_with_functions \ TTS_TAG=tts_edge \ python -m src.cmd.local-terminal-chat.generate_audio2audio > ./log/std_out.log

TQDM_DISABLE=True \ AUDIO_IN_STREAM_TAG=daily_room_audio_in_stream \ AUDIO_OUT_STREAM_TAG=daily_room_audio_out_stream \ MEETING_ROOM_URL=https://weedge.daily.co/chat-bot \ VAD_DETECTOR_TAG=webrtc_silero_vad \ RECORDER_TAG=vad_recorder \ ASR_TAG=sense_voice_asr \ ASR_LANG=zn \ ASR_MODEL_NAME_OR_PATH=./models/FunAudioLLM/SenseVoiceSmall \ LLM_TAG=llm_personalai_proxy \ API_URL=https://personal-ai-ts.weedge.workers.dev/ \ LLM_MODEL_NAME=llama-3.1-70b-versatile \ CHAT_TYPE=chat_with_functions \ TTS_TAG=tts_edge \ python -m src.cmd.local-terminal-chat.generate_audio2audio > ./log/std_out.log