PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.12k stars 122 forks source link

AsyncEngineDeadError with koboldai api server #208

Open ycros opened 10 months ago

ycros commented 10 months ago

Everything seems to work fine via the embedded klite interface, but when I pointed horde at it, it started throwing these:

It seems to kinda sorta maybe still serve horde requests?

INFO 01-16 12:30:08 async_aphrodite.py:133] Aborted request kai-ca722b2c86f04e9b88eed91ac6f5a65e.
INFO:     127.0.0.1:60750 - "POST /api/latest/generate HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
    task.result()
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 337, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 188, in step_async
    output = (await self._run_workers_async(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 225, in _run_workers_async
    assert output == other_output
           ^^^^^^^^^^^^^^^^^^^^^^
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 762, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 782, in app
    await route.handle(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/endpoints/kobold/api_server.py", line 142, in generate
    async for res in result_generator:
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 442, in generate
    raise e
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 436, in generate
    async for request_output in stream:
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 69, in __anext__
    raise result
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 36, in _raise_exception_on_finish
    raise exc
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 31, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
AlpinDale commented 10 months ago

Does it display another error when you kill the server with Ctrl + C?

ycros commented 10 months ago

No idea, but it still seemed somewhat functional. I sort of just killed the entire runpod after that.

AlpinDale commented 10 months ago

The most likely cause for that error is a COOM error, so you may need to lower your number of threads.

ycros commented 9 months ago

I dunno, I tried again - this time instead of an fp16 with an AWQ 32g quant of mixtral (like 26gb on disk) on 2 A6000s (48GB vram each). I did, in a separate execution, on a separate server, push it far until it OOM'd and I clearly saw those CUDA OOM errors. I don't see any such messages in this case.

This time I kept it only to 1 thread in the horde client, I tried both gmu 0.98 and 0.8 - though I frankly have no idea how I should be tuning these values.

My cmd line: python -m aphrodite.endpoints.kobold.api_server --host 0.0.0.0 --served-model-name BagelMIsteryTour-v2-8x7B --model ~/ycros/BagelMIsteryTour-v2-8x7B-AWQ --max-length 1024 -tp 2 -gmu 0.8 --quantization awq --kv-cache-dtype fp8

I'm on a39eeb7188d8bc91a43712435b27ad9e4c2b98d1 running from source.

The failed requests as reported by horde are all these:

Something went wrong when processing request. Please check your trace.log file for the full stack trace. Payload: {'prompt': 'PROMPT REDACTED', 'n': 1, 'max_context_length': 2048, 'max_length': 64, 'rep_pen': 1.1, 'rep_pen_range': 1024,
'rep_pen_slope': 0.7, 'temperature': 0.9, 'tfs': 1.0, 'top_a': 0.0, 'top_k': 0, 'top_p': 0.9, 'typical': 1.0, 'sampler_order': [6, 0, 1, 2, 3, 4, 5], 'use_default_badwordsids': True, 'stop_sequence': [], 'min_p': 0.0, 'dynatemp_range': 0.0,
'dynatemp_exponent': 1.0, 'quiet': True, 'request_type': 'text2text', 'model': 'aphrodite/BagelMIsteryTour-v2-8x7B'}

When I stop it:

^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [7657]
(RayWorker pid=9414) INFO 01-21 10:22:17 model_runner.py:459] Graph capturing finished in 35 secs.
(RayWorker pid=9414) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
root@d38248ce23ec:~#

Here's the log from the terminal as far as my tmux buffer went: aphro-log.txt

Does it log anywhere else I should be looking at before I shut this pod down? Is there anything else you'd like me to try to debug this? (I will probably shut the pod down in say, 12 hours)