PygmalionAI / aphrodite-engine

PygmalionAI's large-scale inference engine
https://pygmalion.chat
GNU Affero General Public License v3.0
660 stars 80 forks source link

AsyncEngineDeadError with koboldai api server #208

Open ycros opened 4 months ago

ycros commented 4 months ago

Everything seems to work fine via the embedded klite interface, but when I pointed horde at it, it started throwing these:

It seems to kinda sorta maybe still serve horde requests?

INFO 01-16 12:30:08 async_aphrodite.py:133] Aborted request kai-ca722b2c86f04e9b88eed91ac6f5a65e.
INFO:     127.0.0.1:60750 - "POST /api/latest/generate HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
    task.result()
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 337, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 188, in step_async
    output = (await self._run_workers_async(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 225, in _run_workers_async
    assert output == other_output
           ^^^^^^^^^^^^^^^^^^^^^^
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 762, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 782, in app
    await route.handle(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app
    raise e
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/aphrodite-engine/aphrodite/endpoints/kobold/api_server.py", line 142, in generate
    async for res in result_generator:
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 442, in generate
    raise e
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 436, in generate
    async for request_output in stream:
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 69, in __anext__
    raise result
  File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 36, in _raise_exception_on_finish
    raise exc
  File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 31, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
AlpinDale commented 4 months ago

Does it display another error when you kill the server with Ctrl + C?

ycros commented 4 months ago

No idea, but it still seemed somewhat functional. I sort of just killed the entire runpod after that.

AlpinDale commented 4 months ago

The most likely cause for that error is a COOM error, so you may need to lower your number of threads.

ycros commented 4 months ago

I dunno, I tried again - this time instead of an fp16 with an AWQ 32g quant of mixtral (like 26gb on disk) on 2 A6000s (48GB vram each). I did, in a separate execution, on a separate server, push it far until it OOM'd and I clearly saw those CUDA OOM errors. I don't see any such messages in this case.

This time I kept it only to 1 thread in the horde client, I tried both gmu 0.98 and 0.8 - though I frankly have no idea how I should be tuning these values.

My cmd line: python -m aphrodite.endpoints.kobold.api_server --host 0.0.0.0 --served-model-name BagelMIsteryTour-v2-8x7B --model ~/ycros/BagelMIsteryTour-v2-8x7B-AWQ --max-length 1024 -tp 2 -gmu 0.8 --quantization awq --kv-cache-dtype fp8

I'm on a39eeb7188d8bc91a43712435b27ad9e4c2b98d1 running from source.

The failed requests as reported by horde are all these:

Something went wrong when processing request. Please check your trace.log file for the full stack trace. Payload: {'prompt': 'PROMPT REDACTED', 'n': 1, 'max_context_length': 2048, 'max_length': 64, 'rep_pen': 1.1, 'rep_pen_range': 1024,
'rep_pen_slope': 0.7, 'temperature': 0.9, 'tfs': 1.0, 'top_a': 0.0, 'top_k': 0, 'top_p': 0.9, 'typical': 1.0, 'sampler_order': [6, 0, 1, 2, 3, 4, 5], 'use_default_badwordsids': True, 'stop_sequence': [], 'min_p': 0.0, 'dynatemp_range': 0.0,
'dynatemp_exponent': 1.0, 'quiet': True, 'request_type': 'text2text', 'model': 'aphrodite/BagelMIsteryTour-v2-8x7B'}

When I stop it:

^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [7657]
(RayWorker pid=9414) INFO 01-21 10:22:17 model_runner.py:459] Graph capturing finished in 35 secs.
(RayWorker pid=9414) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
root@d38248ce23ec:~#

Here's the log from the terminal as far as my tmux buffer went: aphro-log.txt

Does it log anywhere else I should be looking at before I shut this pod down? Is there anything else you'd like me to try to debug this? (I will probably shut the pod down in say, 12 hours)