[Bug] CUDA crash when running xttx inference in Fastapi for streaming endpoint.

hengjiUSTC commented 11 months ago

Describe the bug

I am using code at: https://github.com/hengjiUSTC/xtts-streaming-server/blob/main/server/main.py Building a Fastapi server for streaming TTS service. Got following error

Traceback (most recent call last):
  File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/responses.py", line 277, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/responses.py", line 273, in wrap
    await func()
  File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/responses.py", line 250, in listen_for_disconnect
    message = await receive()
  File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 587, in receive
    await self.message_event.wait()
  File "/opt/conda/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f1252414c40

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
  |     return await self.app(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/fastapi/applications.py", line 276, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
  |     raise exc
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
  |     raise exc
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
  |     await self.app(scope, receive, sender)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
  |     raise e
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
  |     await self.app(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
  |     await route.handle(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
  |     await self.app(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/routing.py", line 69, in app
  |     await response(scope, receive, send)
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/responses.py", line 270, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 658, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/responses.py", line 273, in wrap
    |     await func()
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/responses.py", line 262, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/concurrency.py", line 63, in iterate_in_threadpool
    |     yield await anyio.to_thread.run_sync(_next, iterator)
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 49, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2103, in run_sync_in_worker_thread
    |     return await future
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 823, in run
    |     result = context.run(func, *args)
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/starlette/concurrency.py", line 53, in _next
    |     return next(iterator)
    |   File "/home/ubuntu/xtts-streaming-server/server/main.py", line 147, in predict_streaming_generator
    |     for i, chunk in enumerate(chunks):
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    |     response = gen.send(None)
    |   File "/home/ubuntu/xtts-streaming-server/server/venv/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 633, in inference_stream
    |     text_tokens = torch.IntTensor(self.tokenizer.encode(sent, lang=language)).unsqueeze(0).to(self.device)
    | RuntimeError: CUDA error: an illegal memory access was encountered
    | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
    |
    +------------------------------------

To Reproduce

runnning https://github.com/hengjiUSTC/xtts-streaming-server/blob/main/server/main.py at AWS g4dn.xlarage. With 16GB Gpu and 8G cpu. Using newest 0.20.6 release.

Expected behavior

No response

Logs

No response

Environment

TTS version 0.20.6
pytorch version 2.1.1 install with pip
CUDA version:
>>> print(torch.version.cuda)
12.1

CUDNN version:
>>> print(torch.backends.cudnn.version())
8905

python 3.10.9
OS Ubuntu
GPU: nvidia T4 16GB

Additional context

I think the error do comes with xttx module when running for long time. Does any one have idea why this happening?

WeberJulian commented 11 months ago

Yeah this is often times coming from receiving more than one concurrent request.

hengjiUSTC commented 11 months ago

What is the preferred place to fix this? seems if this crash happens the server will stay in error state and can not recover automatically. Some fix I think of:

we should use single thread in fastapi and only process one request eachtime.
will change this(https://github.com/hengjiUSTC/xtts-streaming-server/blob/main/server/main.py#L23) to 1 precent it from happening?

WeberJulian commented 11 months ago

I'm open for PRs for n°1, it's more of an example project than a real production-ready server.

BobReal0822 commented 10 months ago

What is the preferred place to fix this? seems if this crash happens the server will stay in error state and can not recover automatically. Some fix I think of:

we should use single thread in fastapi and only process one request eachtime.

will change this(https://github.com/hengjiUSTC/xtts-streaming-server/blob/main/server/main.py#L23) to 1 precent it from happening?

I met the same problem, did you find a solution? @hengjiUSTC @WeberJulian

hengjiUSTC commented 10 months ago

I add lock to process one request at a time, and seems problem solved

BobReal0822 commented 10 months ago

I add lock to process one request at a time, and seems problem solved

I will try it, thank you!

mantrakp04 commented 10 months ago

Facing the same issue, cannot do concurrent requests in fastapi, is there a way to fix this?

sushant-samespace commented 2 months ago

@mantrakp04 Did you find anything regarding this ?

coqui-ai / TTS