PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.09k stars 120 forks source link

[Bug]: multi GPU crashes backend #359

Closed mrseeker closed 1 month ago

mrseeker commented 7 months ago

Your current environment

Docker container, can't check

🐛 Describe the bug

When I set NUM_GPUS to 8 (due to having a server with 8 GPU's) I get the following error (sorry but the system hates to properly log errors):

ERROR:    Exception in ASGI application
--
ERROR:    Traceback (most recent call last):
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 442, in engine_step
ERROR:        request_outputs = await self.engine.step_async()
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 208, in step_async
ERROR:        all_outputs = await self._run_workers_async(
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 293, in _run_workers_async
ERROR:        all_outputs = await asyncio.gather(*coros)
ERROR:    asyncio.exceptions.CancelledError
ERROR:
ERROR:    During handling of the above exception, another exception occurred:
ERROR:
ERROR:    Traceback (most recent call last):
ERROR:      File "/usr/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
ERROR:        return fut.result()
ERROR:    asyncio.exceptions.CancelledError
ERROR:
ERROR:    The above exception was the direct cause of the following exception:
ERROR:
ERROR:    Traceback (most recent call last):
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py",
line 407, in run_asgi
ERROR:        result = await app(  # type: ignore[func-returns-value]
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py",
line 69, in __call__
ERROR:        return await self.app(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in
__call__
ERROR:        await super().__call__(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123,
in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line
186, in __call__
ERROR:        raise exc
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line
164, in __call__
ERROR:        await self.app(scope, receive, _send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83,
in __call__
ERROR:        await self.app(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py",
line 62, in __call__
ERROR:        await wrap_app_handling_exceptions(self.app, conn)(scope, receive,
send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
64, in wrapped_app
ERROR:        raise exc
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
53, in wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 758, in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 778, in app
ERROR:        await route.handle(scope, receive, send)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 299, in handle
ERROR:        await self.app(scope, receive, send)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 79, in app
ERROR:        await wrap_app_handling_exceptions(app, request)(scope, receive,
send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
64, in wrapped_app
ERROR:        raise exc
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
53, in wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 74, in app
ERROR:        response = await func(request)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py",
line 278, in app
ERROR:        raw_response = await run_endpoint_function(
ERROR:      File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py",
line 191, in run_endpoint_function
ERROR:        return await dependant.call(**values)
ERROR:      File "/app/aphrodite-engine/aphrodite/endpoints/aws/api_server.py",
line 174, in create_completion
ERROR:        generator = await openai_serving_completion.create_completion(
ERROR:      File
"/app/aphrodite-engine/aphrodite/endpoints/openai/serving_completions.py", line
177, in create_completion
ERROR:        async for i, res in result_generator:
ERROR:      File
"/app/aphrodite-engine/aphrodite/endpoints/openai/serving_completions.py", line
81, in consumer
ERROR:        raise item
ERROR:      File
"/app/aphrodite-engine/aphrodite/endpoints/openai/serving_completions.py", line
66, in producer
ERROR:        async for item in iterator:
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 625, in generate
ERROR:        raise e
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 619, in generate
ERROR:        async for request_output in stream:
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 72, in __anext__
ERROR:        raise result
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 33, in _raise_exception_on_finish
ERROR:        task.result()
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 468, in run_engine_loop
ERROR:        has_requests_in_progress = await asyncio.wait_for(
ERROR:      File "/usr/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
ERROR:        raise exceptions.TimeoutError() from exc
ERROR:    asyncio.exceptions.TimeoutError
ERROR:
ERROR:    The above exception was the direct cause of the following exception:
ERROR:
ERROR:    Traceback (most recent call last):
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py",
line 407, in run_asgi
ERROR:        result = await app(  # type: ignore[func-returns-value]
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py",
line 69, in __call__
ERROR:        return await self.app(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in
__call__
ERROR:        await super().__call__(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123,
in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line
186, in __call__
ERROR:        raise exc
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line
164, in __call__
ERROR:        await self.app(scope, receive, _send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83,
in __call__
ERROR:        await self.app(scope, receive, send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py",
line 62, in __call__
ERROR:        await wrap_app_handling_exceptions(self.app, conn)(scope, receive,
send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
64, in wrapped_app
ERROR:        raise exc
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
53, in wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 758, in __call__
ERROR:        await self.middleware_stack(scope, receive, send)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 778, in app
ERROR:        await route.handle(scope, receive, send)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 299, in handle
ERROR:        await self.app(scope, receive, send)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 79, in app
ERROR:        await wrap_app_handling_exceptions(app, request)(scope, receive,
send)
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
64, in wrapped_app
ERROR:        raise exc
ERROR:      File
"/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line
53, in wrapped_app
ERROR:        await app(scope, receive, sender)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py",
line 74, in app
ERROR:        response = await func(request)
ERROR:      File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py",
line 278, in app
ERROR:        raw_response = await run_endpoint_function(
ERROR:      File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py",
line 191, in run_endpoint_function
ERROR:        return await dependant.call(**values)
ERROR:      File "/app/aphrodite-engine/aphrodite/endpoints/aws/api_server.py",
line 174, in create_completion
ERROR:        generator = await openai_serving_completion.create_completion(
ERROR:      File
"/app/aphrodite-engine/aphrodite/endpoints/openai/serving_completions.py", line
177, in create_completion
ERROR:        async for i, res in result_generator:
ERROR:      File
"/app/aphrodite-engine/aphrodite/endpoints/openai/serving_completions.py", line
81, in consumer
ERROR:        raise item
ERROR:      File
"/app/aphrodite-engine/aphrodite/endpoints/openai/serving_completions.py", line
66, in producer
ERROR:        async for item in iterator:
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 625, in generate
ERROR:        raise e
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 612, in generate
ERROR:        stream = await self.add_request(request_id,
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 503, in add_request
ERROR:        self.start_background_loop()
ERROR:      File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py",
line 379, in start_background_loop
ERROR:        raise AsyncEngineDeadError(
ERROR:    aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Background loop
has errored already.
INFO:     169.254.178.2:35944 - "GET /health HTTP/1.1" 200

The annoying part is that the server is not stopped, and the "health" still shows 200 (should not be the case, since the backend crashed).

mrseeker commented 7 months ago

If #265 fixes this issue, let me know, and then I will be happy to test it out.

AlpinDale commented 7 months ago

The error log isn't very helpful. It may give you more info if you kill the server (async moment). It could be due an internal timeout, but hard to tell with this error.

mrseeker commented 7 months ago
[rank0]:[E ProcessGroupNCCL.cpp:523] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1442, OpType=GATHER, NumelIn=16000, NumelOut=16000, Timeout(ms)=600000) ran for 600251 milliseconds before timing out.
--
[rank0]:[E ProcessGroupNCCL.cpp:537] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank0]:[E ProcessGroupNCCL.cpp:543] To avoid data inconsistency, we are taking the entire process down.
[rank0]:[E ProcessGroupNCCL.cpp:1182] [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1442, OpType=GATHER, NumelIn=16000, NumelOut=16000, Timeout(ms)=600000) ran for 600251 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f860ad81d87 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f85c07df6e6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f85c07e2c3d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f85c07e3839 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7f860a4b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7f860c341ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7f860c3d2a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[2024-03-25 13:52:52,844 E 1 6316] logging.cc:97: Unhandled exception: N3c1016DistBackendErrorE. what(): [Rank 0] NCCL watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1442, OpType=GATHER, NumelIn=16000, NumelOut=16000, Timeout(ms)=600000) ran for 600251 milliseconds before timing out.
Exception raised from checkTimeout at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:525 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f860ad81d87 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1e6 (0x7f85c07df6e6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x19d (0x7f85c07e2c3d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x119 (0x7f85c07e3839 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7f860a4b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7f860c341ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #6: clone + 0x44 (0x7f860c3d2a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1186 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f860ad81d87 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xdf6b11 (0x7f85c0539b11 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xdc253 (0x7f860a4b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #3: <unknown function> + 0x94ac3 (0x7f860c341ac3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #4: clone + 0x44 (0x7f860c3d2a04 in /usr/lib/x86_64-linux-gnu/libc.so.6)
[2024-03-25 13:52:52,860 E 1 6316] logging.cc:104: Stack trace:  /usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe543a) [0x7f84b00f043a] ray::operator<<()
/usr/local/lib/python3.10/dist-packages/ray/_raylet.so(+0xfe7b78) [0x7f84b00f2b78] ray::TerminateHandler()
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f860a48220c]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f860a482277]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7f860a4821fe]
/usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0xdf6bcc) [0x7f85c0539bcc] c10d::ProcessGroupNCCL::ncclCommWatchdog()
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f860a4b0253]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f860c341ac3]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7f860c3d2a04] __clone
*** SIGABRT received at time=1711374772 on cpu 76 ***
PC: @     0x7f860c3439fc  (unknown)  pthread_kill    @     0x7f860c2ef520  (unknown)  (unknown)
[2024-03-25 13:52:52,861 E 1 6316] logging.cc:361: *** SIGABRT received at time=1711374772 on cpu 76 ***
[2024-03-25 13:52:52,861 E 1 6316] logging.cc:361: PC: @     0x7f860c3439fc  (unknown)  pthread_kill
[2024-03-25 13:52:52,861 E 1 6316] logging.cc:361:     @     0x7f860c2ef520  (unknown)  (unknown)
Fatal Python error: Aborted
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, ray._raylet, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.libs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cupy_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._routines_binary, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, regex._regex, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, markupsafe._speedups, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct (total: 157)
[failure_signal_handler.cc : 332] RAW: Signal 11 raised at PC=0x7f860c2d5898 while already in AbslFailureSignalHandler()
*** SIGSEGV received at time=1711374772 on cpu 76 ***
PC: @     0x7f860c2d5898  (unknown)  abort    @     0x7f860c2ef520  (unknown)  (unknown)    @     0x7f844056d640  (unknown)  (unknown)
[2024-03-25 13:52:52,867 E 1 6316] logging.cc:361: *** SIGSEGV received at time=1711374772 on cpu 76 ***
[2024-03-25 13:52:52,867 E 1 6316] logging.cc:361: PC: @     0x7f860c2d5898  (unknown)  abort
[2024-03-25 13:52:52,870 E 1 6316] logging.cc:361:     @     0x7f860c2ef520  (unknown)  (unknown)
[2024-03-25 13:52:52,873 E 1 6316] logging.cc:361:     @     0x7f844056d640  (unknown)  (unknown)
Fatal Python error: Segmentation fault
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, ray._raylet, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.libs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cupy_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._routines_binary, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, regex._regex, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, markupsafe._speedups, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct (total: 157)
Starting Aphrodite Engine API server...
+ exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 8080 --download-dir /tmp/hub --model <REDACTED> --tensor-parallel-size 8 --served-model-name <REDACTED>
2024-03-25 13:53:03,923#011INFO worker.py:1752 -- Started a local Ray instance.
INFO:     Initializing the Aphrodite Engine (v0.5.2) with the following config:
INFO:     Model = '<REDACTED>'
INFO:     DataType = torch.float16
INFO:     Model Load Format = auto
INFO:     Number of GPUs = 8
INFO:     Disable Custom All-Reduce = False
INFO:     Quantization Format = None
INFO:     Context Length = 32768
INFO:     Enforce Eager Mode = False
INFO:     KV Cache Data Type = auto
INFO:     KV Cache Params Path = None
INFO:     Device = cuda
#033[36m(RayWorkerAphrodite pid=6025)#033[0m INFO:     Downloading model weights ['*.safetensors']
INFO:     Downloading model weights ['*.safetensors']
INFO:     Model weights loaded. Memory usage: 1.70 GiB x 8 = 13.57 GiB
#033[36m(RayWorkerAphrodite pid=5797)#033[0m INFO:     Model weights loaded. Memory usage: 1.70 GiB x 8 = 13.57 GiB
INFO:     # GPU blocks: 115810, # CPU blocks: 16384
INFO:     Minimum concurrency: 56.55x
INFO:     Maximum sequence length allowed in the cache: 1852960
INFO:     Capturing the model for CUDA graphs. This may lead to unexpected
consequences if the model is not static. To run the model in eager mode, set
'enforce_eager=True' or use '--enforce-eager' in the CLI.
WARNING:  CUDA graphs can take additional 1~3 GiB of memory per GPU. If you are
running out of memory, consider decreasing `gpu_memory_utilization` or enforcing
eager mode.
AlpinDale commented 7 months ago

Seems like a timeout error. Did you have a sequence that took longer than 60 seconds to process? As a hotfix, you can increase the timeout threshold:

export APHRODITE_ENGINE_ITERATION_TIMEOUT_S=120

That would set the limit to 120 seconds. You can of course pass it as an env variable to the docker image.

mrseeker commented 7 months ago

Currently running on AWS. My setup:

And it crashes on a 60sec timeout, meaning it hogs a single GPU instead of distributing the load.

mrseeker commented 7 months ago

Normal response time for 1 GPU with 1 client is around 6 seconds.

AlpinDale commented 1 month ago

Most issues fixed with v0.6.0