parameter values for inference

coqui-ai / xtts-streaming-server

Mozilla Public License 2.0

298 stars 90 forks source link

parameter values for inference #8

Closed santhosh-sp closed 1 year ago

santhosh-sp commented 1 year ago

Hello,

What are the exact parameter values need to pass. 1. speaker embedding and 2.gpt_conda_latent

Thanks, santhosh

abeiro commented 1 year ago

the ones you obtain as the result of cloning a voice

WeberJulian commented 1 year ago

Hey, you have an exhaustive example of how to call the endpoints here: https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/test_streaming.py

santhosh-sp commented 12 months ago

the ones you obtain as the result of cloning a voice

Hi @abeiro,

I tried with below payload and I got the response from postman. Unfortunately the audio is not playing anything.

tts_payload.json

Even its not playing in swagger API.

Please check and revert me.

Thanks in advance, Santhosh.

WeberJulian commented 12 months ago

I think your calling the endpoint correctly hence the status code, it's just that you don't interpret the response correctly. Here is some python code to interpret the audio checks: https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/test_streaming.py

If you want the JS equivalent, there is the webapp on this repo.

santhosh-sp commented 12 months ago

Its working for first time and getting below error.

After running this https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/test_streaming.py

| RuntimeError: CUDA error: device-side assert triggered
| CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
| For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
| Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

WeberJulian commented 12 months ago

It's quite a general error, but probably because you ran multiple concurrent requests right?

santhosh-sp commented 12 months ago

Yes, I am using multiple requests. Is there any way to handle multiple requests for TTS streaming?

I am using 3 * 24 GB VRAM RTX 3090.

Please suggest an approach to handle this problem....

Some times getting this error when request from UI. Error:

| RuntimeError: probability tensor contains either inf, nan or element < 0

reuben commented 12 months ago

You need to run multiple servers and distribute requests to them in a way that doesn't send two concurrent requests to the same server.

santhosh-sp commented 12 months ago

Could you share the capability of max concurrent requests on single 24 GPU?. If not what are limitations for scalable application?

reuben commented 12 months ago

Each server process can only handle one concurrent request. A 24GB VRAM GPU should be able to handle 3 or even 4 servers, but you'll have to test with representative inputs for your use case to be sure. So with 3 of those GPUs in a machine you should be able to handle 9 to 12 concurrent requests in it.

santhosh-sp commented 12 months ago

Thanks for the update. @reuben