Closed santhosh-sp closed 1 year ago
the ones you obtain as the result of cloning a voice
Hey, you have an exhaustive example of how to call the endpoints here: https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/test_streaming.py
the ones you obtain as the result of cloning a voice
Hi @abeiro,
I tried with below payload and I got the response from postman. Unfortunately the audio is not playing anything.
Even its not playing in swagger API.
Please check and revert me.
Thanks in advance, Santhosh.
I think your calling the endpoint correctly hence the status code, it's just that you don't interpret the response correctly. Here is some python code to interpret the audio checks: https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/test_streaming.py
If you want the JS equivalent, there is the webapp on this repo.
Its working for first time and getting below error.
After running this https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/test_streaming.py
| RuntimeError: CUDA error: device-side assert triggered
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
| Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
It's quite a general error, but probably because you ran multiple concurrent requests right?
Yes, I am using multiple requests. Is there any way to handle multiple requests for TTS streaming?
I am using 3 * 24 GB VRAM RTX 3090.
Please suggest an approach to handle this problem....
Some times getting this error when request from UI. Error:
| RuntimeError: probability tensor contains either inf
, nan
or element < 0
You need to run multiple servers and distribute requests to them in a way that doesn't send two concurrent requests to the same server.
Could you share the capability of max concurrent requests on single 24 GPU?. If not what are limitations for scalable application?
Each server process can only handle one concurrent request. A 24GB VRAM GPU should be able to handle 3 or even 4 servers, but you'll have to test with representative inputs for your use case to be sure. So with 3 of those GPUs in a machine you should be able to handle 9 to 12 concurrent requests in it.
Thanks for the update. @reuben
Hello,
What are the exact parameter values need to pass. 1. speaker embedding and 2.gpt_conda_latent
Thanks, santhosh