KoljaB / RealtimeTTS

Converts text to speech in realtime
1.94k stars 191 forks source link

Puzzling contrast between Fast-Api example and pyqt6_speed_test.py #179

Open FILLITUP opened 2 days ago

FILLITUP commented 2 days ago

Greetings, Although I can speculate, it's not clear why the audio stream in pyqt6 is hearable at least 2 seconds earlier than in the Fast-Api example. The same passage of text is presented to both and the logs show very similar latency times (0.5 to 0.75). The answer might be a clue as to why I am having roughly a 3-4 second audio delay in my attempted chats via LocalEmotionalAIVoiceChat and any LM Studio loaded LLM.
I'm running a Windows 11 machine with lots of memory, CPU and GPU. Thanks in advance.

KoljaB commented 1 day ago

Hm maybe host resolution adds latency. Can you change the client from localhost to 127.0.0.1 and see if it gets better?

FILLITUP commented 1 day ago

I tried that 1st thing, same outcome. I have started assuming that the QT framework may simply be quicker at handling audio stream.?.

KoljaB commented 1 day ago

In the first lines of server.py please set

DEBUG_LOGGING = True

The server should now let you know how long RealtimeTTS needs to synthesize the first audio chunk:

INFO:root:Audio stream start, latency to first chunk: 0.23s

With this updated code the client also logs timings:

(venv) C:\Dev\Audio\RealtimeTTS\RealtimeTTS\example_fast_api>python client.py
Time to first token: 0.24388694763183594

The difference tells you how much time is spent within the fastapi request processing. (so here on my system - with server and client both local - fastapi and network only add ~0.01s to latency)

Can you verify this on your system? Maybe we can then see better where the time is spent then.

FILLITUP commented 1 day ago

Thank you. Your suggestions were immensely helpful. Notes: Network latency is similar to yours. ~.01 sec. as measured by your updated client.py script. I suspect that the network latency is similar for the browser as client.

Your updated client.py is clearly faster than the original; this is more clearly evident when you use a text string a lot longer than "hello world '.

When using Edge browser as client, however, with longer text strings, the time it takes to actually hear the audio becomes larger by 2-3 seconds as compared to the same string in client.py. The latencies, as reflected in the logs, of the 2 are essentially the same. The actual latencies are clearly different: Client.py audio is heard almost immediately after audio stream starts, whereas, for browser, the audio heard is noticeably delayed after audio stream starts.

KoljaB commented 22 hours ago

Which code do you use to play out over the browser?

FILLITUP commented 21 hours ago

I used server.py provided in "example_fast_api" which was also the source of the client.py, before and then after with the updated client.py. Server.py provided the url to open the EDGE browser. It just seems like the audio stream back to the browser for play is impeded in some way.?.