bolna-ai / bolna

End-to-end platform for building voice first multimodal agents
https://playground.bolna.dev
MIT License
310 stars 89 forks source link

whisper-melo-llama3 not receiving my voice #307

Open acastry opened 1 week ago

acastry commented 1 week ago

Hi I am trying to deploy whisper-melo-llama3

I created an agent with my ngrok adress coming from my ngrok token : curl --location 'https://XXXXXXXXXXXX.ngrok-free.app/agent' \ --header 'Content-Type: application/json' \ --data '{ "agent_config": { "agent_name": "Alfred", "agent_type": "other", "tasks": [ { "task_type": "conversation", "tools_config": { "llm_agent": { "model": "deepinfra/meta-llama/Meta-Llama-3-70B-Instruct", "max_tokens": 123, "agent_flow_type": "streaming", "use_fallback": true, "family": "llama", "temperature": 0.1, "request_json": true, "provider":"deepinfra" }, "synthesizer": { "provider": "melotts", "provider_config": { "voice": "Casey", "sample_rate": 8000, "sdp_ratio" : 0.2, "noise_scale" : 0.6, "noise_scale_w" : 0.8, "speed" : 1.0 }, "stream": true, "buffer_size": 123, "audio_format": "wav" }, "transcriber": { "encoding": "linear16", "language": "en", "model": "whisper", "stream": true, "task": "transcribe" }, "input": { "provider": "twilio", "format": "wav" }, "output": { "provider": "twilio", "format": "wav" } }, "toolchain": { "execution": "parallel", "pipelines": [ [ "transcriber", "llm", "synthesizer" ] ] } } ] }, "agent_prompts": { "task_1": { "system_prompt": "What is the Ultimate Question of Life, the Universe, and Everything?" } }

It returns "{"agent_id":"*************-3409-4f09-a1a7-582b12232444","state":"created"}"

Then i try to do

curl --location 'https://XXXXXXXXXXXX.ngrok-free.app/call' \ --header 'Content-Type: application/json' \ --data '{ "agent_id": "*************-3409-4f09-a1a7-582b12232444", "recipient_phone_number": "+590690320620" }' {"detail":"Not Found"}

So i do

curl --location '[http://0.0.0.0:/call](http://0.0.0.0:8001/call)' \ --header 'Content-Type: application/json' \ --data '{ "agent_id": "*************-3409-4f09-a1a7-582b12232444", "recipient_phone_number": "+590690320620" }'

to get it working don't know why

It calls me but the system doesn't hear my voice. DO i have to enter any endpoint into TWILIO ? Please help me @prateeksachan

2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {telephony} [handle] Sending Message None and MZbcf6c8ebc7391c74914520cf4cfa7639 and {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 15} 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {telephony} [handle] Sending message 4096 linear16 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {twilio} [form_media_message] Converting to mulaw 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {task_manager} [__process_output_loop] Duration of the byte 0.256 2024-07-03 01:23:25 2024-07-03 05:23:25.753 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.256 to maintain quueue on our side 8000 2024-07-03 01:23:25 2024-07-03 05:23:25.988 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1719984205.988593 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {task_manager} [__process_output_loop] Started transmitting at 1719984205.9892044 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {task_manager} [__process_output_loop] ##### Start response is True for 16 and hence starting to speak {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 16} Current sequence ids {-1} 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {telephony} [handle] Sending Message None and MZbcf6c8ebc7391c74914520cf4cfa7639 and {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 16} 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {telephony} [handle] Sending message 4096 linear16 2024-07-03 01:23:25 2024-07-03 05:23:25.989 INFO {twilio} [form_media_message] Converting to mulaw 2024-07-03 01:23:25 2024-07-03 05:23:25.990 INFO {task_manager} [__process_output_loop] Duration of the byte 0.256 2024-07-03 01:23:25 2024-07-03 05:23:25.990 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.256 to maintain quueue on our side 8000 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1719984206.2192206 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {task_manager} [__process_output_loop] Started transmitting at 1719984206.219604 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {task_manager} [__process_output_loop] ##### Start response is True for 17 and hence starting to speak {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 17, 'is_final_chunk_of_entire_response': True} Current sequence ids {-1} 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {telephony} [handle] Sending Message None and MZbcf6c8ebc7391c74914520cf4cfa7639 and {'io': 'twilio', 'message_category': 'agent_welcome_message', 'stream_sid': 'MZbcf6c8ebc7391c74914520cf4cfa7639', 'request_id': 'a3fb47c6-7301-4d27-bcf2-eb8fd5fbcfcc', 'cached': False, 'sequence_id': -1, 'format': 'linear16', 'text': 'This call is being recorded for quality assurance and training. Please speak now.', 'is_md5_hash': False, 'llm_generated': False, 'type': 'audio', 'synthesizer_start_time': 1719984197.0962937, 'is_first_chunk': True, 'synthesizer_latency': 5.429271936416626, 'synthesizer_first_chunk_latency': 5.429289817810059, 'chunk_id': 17, 'is_final_chunk_of_entire_response': True} 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {telephony} [handle] Sending message 828 linear16 2024-07-03 01:23:26 2024-07-03 05:23:26.219 INFO {twilio} [form_media_message] Converting to mulaw 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] Duration of the byte 0.05175 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] ##### End of synthesizer stream and 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] Making first message passed as True 2024-07-03 01:23:26 2024-07-03 05:23:26.221 INFO {task_manager} [__process_output_loop] ##### Sleeping for 0.05175 to maintain quueue on our side 8000 2024-07-03 01:23:26 2024-07-03 05:23:26.244 INFO {task_manager} [__process_output_loop] ##### Updating Last transmitted timestamp to 1719984206.2447424 2024-07-03 01:23:26 2024-07-03 05:23:26.244 INFO {task_manager} [__process_output_loop] First interim result hasn't been gotten yet and hence sleeping 2024-07-03 01:23:26 2024-07-03 05:23:26.345 INFO {task_manager} [__process_output_loop] ##### Got to wait 300 ms before speaking and alreasy waited -1 since the first interim result 2024-07-03 01:23:26 2024-07-03 05:23:26.591 INFO {task_manager} [__check_for_completion] Only 0.34679651260375977 seconds since last spoken time stamp and hence not cutting the phone call 2024-07-03 01:23:28 2024-07-03 05:23:28.586 INFO {task_manager} [__handle_initial_silence] Checking for initial silence 15 2024-07-03 01:23:28 2024-07-03 05:23:28.594 INFO {task_manager} [__check_for_completion] Only 2.349334239959717 seconds since last spoken time stamp and hence not cutting the phone call 2024-07-03 01:23:30 2024-07-03 05:23:30.596 INFO {task_manager} [__check_for_completion] Only 4.351584434509277 seconds since last spoken time stamp and hence not cutting the phone call 2024-07-03 01:23:31 2024-07-03 05:23:31.587 INFO {task_manager} [__handle_initial_silence] Checking for initial silence 15 2024-07-03 01:23:32 2024-07-03 05:23:32.601 INFO {task_manager} [__check_for_completion] Asking if the user is still there 2024-07-03 01:23:32 2024-07-03 05:23:32.605 INFO {task_manager} [_synthesize] ##### sending text to melotts for generation: Hey, are you still there? 2024-07-03 01:23:32 2024-07-03 05:23:32.605 INFO {melo_synthesizer} [push] Pushed message to internal queue 2024-07-03 01:23:32 2024-07-03 05:23:32.606 INFO {twilio} [handle_interruption] interrupting because user spoke in between 2024-07-03 01:23:32 2024-07-03 05:23:32.607 INFO {utils} [write_request_logs] Message {'direction': 'request', 'data': 'Hey, are you still there?', 'leg_id': 'eadcdfac-26f4-458b-9773-88a260359249', 'time': '2024-07-03 05:23:32', 'component': 'synthesizer', 'sequence_id': -1, 'model': 'melotts', 'cached': False, 'latency': None, 'is_final': False, 'engine': 'default'}

Full logs attached

bolna-app.log

Thank you !