coqui-ai / xtts-streaming-server

Mozilla Public License 2.0
281 stars 77 forks source link

Not working properly. RuntimeError: shape '[-1, 1024]' is invalid for input of size 1 #9

Closed jqueguiner closed 10 months ago

jqueguiner commented 10 months ago
curl -X 'POST' \
  'http://localhost1:8000/tts_stream' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "speaker_embedding": [
    0
  ],
  "gpt_cond_latent": [
    [
      0
    ]
  ],
  "text": "this is a test.",
  "language": "en",
  "add_wav_header": true,
  "stream_chunk_size": "20"
}'
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/responses.py", line 273, in wrap
    |     await func()
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/responses.py", line 262, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/concurrency.py", line 63, in iterate_in_threadpool
    |     yield await anyio.to_thread.run_sync(_next, iterator)
    |   File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 49, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2103, in run_sync_in_worker_thread
    |     return await future
    |   File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 823, in run
    |     result = context.run(func, *args)
    |   File "/opt/conda/lib/python3.10/site-packages/starlette/concurrency.py", line 53, in _next
    |     return next(iterator)
    |   File "/app/main.py", line 134, in predict_streaming_generator
    |     torch.tensor(parsed_input.gpt_cond_latent).reshape((-1, 1024)).unsqueeze(0)
    | RuntimeError: shape '[-1, 1024]' is invalid for input of size 1
    +------------------------------------
santhosh-sp commented 10 months ago

Hi @jqueguiner,

You need to this https://github.com/coqui-ai/xtts-streaming-server/blob/main/test/default_speaker.json embedding values on "speaker_embedding", "gpt_cond_latent". Its clone of female voice to which the output audio contains.

You can try with this file. [Uploading tts_payload.json…]()

WeberJulian commented 10 months ago

As @jqueguiner said

  "speaker_embedding": [
    0
  ],
  "gpt_cond_latent": [
    [
      0
    ]
  ],

Those are not valid embeddings

jqueguiner commented 10 months ago

@WeberJulian Ok got it: is there a library of existing "open source" voice embeddings our any good resource to get voice with commercial allowed usage ?