Closed moseshu closed 14 hours ago
using English to submit this report in order to facilitate communication. \ 使用下面的代码去调用,生成的audio语速很快,和离线的效果差距很大,是配置的问题,还是其他问题? Using the following code to call, the generated audio speech speed is very fast, and the effect of offline is very large, is the problem of configuration, or other problems?
We haven't closely check this, may be with sampling rate and stuff. @kunci115 might help
It can be: That the sampling rate and audio format used by both the server and client aren't consistent. or The buffer size might be too small, causing the audio to be played faster than intended
I've just debug in short time, so far from what i checked, it only happen with ch language text >50 token, make it short for immediate solution, since I don't really understand CH preprocessing
import socket
import numpy as np
import asyncio
import pyaudio
import re
def chunk_text(text, max_length=50):
"""
Splits the input text into smaller chunks based on punctuation and length.
Adjust max_length to control chunk size.
"""
sentences = re.split(r'(?<=[。!?])', text) # Adjust for Chinese punctuation
chunks = []
current_chunk = ""
for sentence in sentences:
if len(current_chunk) + len(sentence) > max_length:
chunks.append(current_chunk)
current_chunk = sentence
else:
current_chunk += sentence
if current_chunk:
chunks.append(current_chunk)
return chunks
async def listen_to_voice(text, server_ip='localhost', server_port=9998):
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect((server_ip, server_port))
async def play_audio_stream():
buffer = b''
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32,
channels=1,
rate=24000,
output=True,
frames_per_buffer=2048)
try:
while True:
chunk = await asyncio.get_event_loop().run_in_executor(None, client_socket.recv, 1024)
if not chunk: # End of stream
break
if b"END_OF_AUDIO" in chunk:
buffer += chunk.replace(b"END_OF_AUDIO", b"")
if buffer:
audio_array = np.frombuffer(buffer, dtype=np.float32).copy()
stream.write(audio_array.tobytes())
break
buffer += chunk
if len(buffer) >= 4096:
audio_array = np.frombuffer(buffer[:4096], dtype=np.float32).copy()
stream.write(audio_array.tobytes())
buffer = buffer[4096:]
finally:
stream.stop_stream()
stream.close()
p.terminate()
try:
# Split text into chunks
text_chunks = chunk_text(text)
# Send each chunk, waiting for playback to finish before proceeding
for chunk in text_chunks:
await asyncio.get_event_loop().run_in_executor(None, client_socket.sendall, chunk.encode('utf-8'))
await play_audio_stream() # Play the current chunk fully before sending the next
print(f"Finished playing chunk: {chunk}")
print("Audio playback finished.")
except Exception as e:
print(f"Error in listen_to_voice: {e}")
finally:
client_socket.close()
# Example usage
async def main():
await listen_to_voice(
"春天的江潮水势浩荡,与大海连成一片,一轮明月从海上升起,好像与潮水一起涌出来。月光照耀着春江,随着波浪闪耀千万里,所有地方的春江都有明亮的月光!"
"江水曲曲折折地绕着花草丛生的原野流淌,月光照射着开遍鲜花的树林好像细密的雪珠在闪烁。月色如霜,所以霜飞无从觉察,洲上的白沙和月色融合在一起,看不分明。"
"江水、天空成一色,没有一点微小灰尘,明亮的天空中只有一轮孤月高悬空中。江边上什么人最初看见月亮?江上的月亮哪一年最初照耀着人?人生一代代地无穷无尽,只有江上的月亮一年年地总是相像。"
"不知江上的月亮等待着什么人,只见长江不断地一直运输着流水。游子像一片白云缓缓地离去,只剩下思妇站在离别的青枫浦不胜忧愁。哪家的游子今晚坐着小船在漂流?",
server_ip='localhost', server_port=9998
)
# Run the main async function
asyncio.run(main())
Since this issue has been inactive for a long time, it will be closed. Feel free to reopen this issue and ask questions at any time.
Checks
Question details
follow the script python socket_server.py. 使用下面的代码去调用,生成的audio语速很快,和离线的效果差距很大,是配置的问题,还是其他问题?