[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus

Your current environment

aphrodite docker container

Setting 1 GPUs: RTX8000 * 2 model: alpindale/c4ai-command-r-plus-GPTQ Quantization: gptq

Setting 2 GPUs: A6000 ada * 4 model: CohereForAI/c4ai-command-r-plus Quantization: load-in-smooth

🐛 Describe the bug

Starting Aphrodite Engine API server...

exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model alpindale/c4ai-command-r-plus-GPTQ --dtype float16 --max-model-len 29000 --tensor-parallel-size 2 --gpu-memory-utilization 0.97 --quantization gptq --enforce-eager true --trust-remote-code /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( WARNING: gptq quantization is not fully optimized yet. The speed can be slower than non-quantized models. 2024-05-17 02:21:49,653 INFO worker.py:1749 -- Started a local Ray instance. INFO: Initializing the Aphrodite Engine (v0.5.3) with the following config: INFO: Model = 'alpindale/c4ai-command-r-plus-GPTQ' INFO: Speculative Config = None INFO: DataType = torch.float16 INFO: Model Load Format = auto INFO: Number of GPUs = 2 INFO: Disable Custom All-Reduce = False INFO: Quantization Format = gptq INFO: Context Length = 29000 INFO: Enforce Eager Mode = True INFO: KV Cache Data Type = auto INFO: KV Cache Params Path = None INFO: Device = cuda INFO: Guided Decoding Backend = DecodingConfig(guided_decoding_backend='outlines') Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. WARNING: The tokenizer's vocabulary size 255029 does not match the model's vocabulary size 256000. /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( INFO: Cannot use FlashAttention backend for Volta and Turing GPUs. INFO: Using XFormers backend. (RayWorkerAphrodite pid=1127) INFO: Cannot use FlashAttention backend for Volta and Turing GPUs. (RayWorkerAphrodite pid=1127) INFO: Using XFormers backend. INFO: Aphrodite is using nccl==2.20.5 (RayWorkerAphrodite pid=1127) INFO: Aphrodite is using nccl==2.20.5 INFO: generating GPU P2P access cache for in /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json INFO: reading GPU P2P access cache from /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json (RayWorkerAphrodite pid=1127) INFO: reading GPU P2P access cache from (RayWorkerAphrodite pid=1127) /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json (RayWorkerAphrodite pid=1127) INFO: Using model weights format ['.safetensors'] INFO: Using model weights format ['.safetensors'] INFO: Model weights loaded. Memory usage: 27.78 GiB x 2 = 55.55 GiB rank0: Traceback (most recent call last): rank0: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main rank0: return _run_code(code, main_globals, None, rank0: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code rank0: exec(code, run_globals) rank0: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in rank0: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server rank0: engine = AsyncAphrodite.from_engine_args(engine_args) rank0: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in from_engine_args rank0: engine = cls(engine_config.parallel_config.worker_use_ray, rank0: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 323, in init rank0: self.engine = self._init_engine(*args, kwargs) rank0: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 429, in _init_engine rank0: return engine_class(*args, *kwargs) rank0: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 142, in init rank0: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 182, in _initialize_kv_caches rank0: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 208, in determine_num_available_blocks rank0: num_blocks = self._run_workers("determine_num_available_blocks", ) rank0: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 309, in _run_workers rank0: driver_worker_output = getattr(self.driver_worker, rank0: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(args, kwargs) rank0: File "/app/aphrodite-engine/aphrodite/task_handler/worker.py", line 144, in determine_num_available_blocks rank0: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(*args, kwargs) rank0: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 948, in profile_run rank0: self.execute_model(seqs, kv_caches) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(args, kwargs) rank0: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 868, in execute_model rank0: hidden_states = model_executable(execute_model_kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(*args, *kwargs) rank0: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 390, in forward rank0: hidden_states = self.model(input_ids, positions, kv_caches, rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, kwargs) rank0: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 349, in forward rank0: hidden_states, residual = layer( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 305, in forward rank0: hidden_states, residual = self.input_layernorm(hidden_states, residual) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, *kwargs) rank0: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 82, in forward rank0: hidden_states = layer_norm_func(hidden_states, self.weight, rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn rank0: return fn(args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors rank0: return callback(frame, cache_entry, hooks, frame_state, skip=1) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame rank0: result = inner_convert( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert rank0: return _compile( rank0: File "/usr/lib/python3.10/contextlib.py", line 79, in inner rank0: return func(*args, kwds) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile rank0: guarded_code = compile_inner(code, one_graph, hooks, transform) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(*args, *kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner rank0: out_code = transform_code_object(code, transform) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object rank0: transformations(instructions, code_options) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn rank0: return fn(args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run rank0: and self.step() rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step rank0: getattr(self, inst.opname)(inst) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 971, in compile_subgraph rank0: self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) rank0: File "/usr/lib/python3.10/contextlib.py", line 79, in inner rank0: return func(*args, kwds) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1168, in compile_and_call_fx_graph rank0: compiled_fn = self.call_user_compiler(gm) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(*args, *kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1241, in call_user_compiler rank0: raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1222, in call_user_compiler rank0: compiled_fn = compiler_fn(gm, self.example_inputs()) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper rank0: compiled_gm = compiler_fn(gm, example_inputs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1729, in call rank0: return compilefx(model, inputs_, config_patches=self.config) rank0: File "/usr/lib/python3.10/contextlib.py", line 79, in inner rank0: return func(args, kwds) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx rank0: return aot_autograd( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn rank0: cg = aot_module_simplified(gm, example_inputs, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified rank0: compiled_fn = create_aot_dispatcher_function( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(*args, *kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function rank0: compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe rank0: return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base rank0: return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base rank0: compiled_fw = compiler(fw_module, updated_flat_args) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base rank0: return inner_compile( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper rank0: inner_compiled_fn = compiler_fn(gm, example_inputs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 304, in inner rank0: return fn(*args, kwargs) rank0: File "/usr/lib/python3.10/contextlib.py", line 79, in inner rank0: return func(*args, *kwds) rank0: File "/usr/lib/python3.10/contextlib.py", line 79, in inner rank0: return func(args, kwds) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(*args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner rank0: compiled_graph = fx_codegen_and_compile( rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile rank0: compiled_fn = graph.compile_to_fn() rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn rank0: return self.compile_to_module().call rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(*args, *kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1250, in compile_to_module rank0: self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1208, in codegen rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper rank0: r = func(args, kwargs) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/scheduler.py", line 2339, in codegen rank0: self.get_backend(device).codegen_nodes(node.get_nodes()) # type: ignorepossibly-undefined: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 63, in codegen_nodes rank0: return self._triton_scheduling.codegen_nodes(nodes) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3255, in codegen_nodes rank0: return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3427, in codegen_node_schedule rank0: kernel_name = self.define_kernel(src_code, node_schedule) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3537, in definekernel rank0: basename, , kernel_path = get_path(code_hash(src_code.strip()), "py") rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codecache.py", line 349, in get_path rank0: subdir = os.path.join(cache_dir(), basename[1:3]) rank0: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/utils.py", line 739, in cache_dir rank0: sanitizedusername = re.sub(r'[\/:*?"<>|]', "", getpass.getuser()) rank0: File "/usr/lib/python3.10/getpass.py", line 169, in getuser rank0: return pwd.getpwuid(os.getuid())0: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:

rank0: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

rank0: You can suppress this exception and fall back to eager by setting: rank0: import torch._dynamo rank0: torch._dynamo.config.suppress_errors = True

(RayWorkerAphrodite pid=1127) INFO: Model weights loaded. Memory usage: 27.78 GiB x 2 = 55.55 GiB (RayWorkerAphrodite pid=1127) ERROR: Error executing method determine_num_available_blocks. This might (RayWorkerAphrodite pid=1127) cause deadlock in distributed execution. [W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

This is the log generated with gptq version. The same errors are raised when running with non quantized version of the model. gptq version works fine on vllm.

Also getting this error for turboderp/command-r-plus-103B-exl2 on 2x4090s on Runpod (EDIT: and also Dracones/c4ai-command-r-v01_exl2_3.0bpw on 1x4090) with latest official Aphrodite Docker image as of writing:

alpindale/aphrodite-engine@sha256:b1e72201654a172e044a13d9346264a8b4e562dba8f3572bd92f013cf5420eb1

CMD_ADDITIONAL_ARGUMENTS="--model turboderp/command-r-plus-103B-exl2 --revision 3.0bpw --tokenizer-revision 3.0bpw --quantization exl2 --max-model-len 4096 --kv-cache-dtype fp8 --dtype float16 --enforce-eager true"
PORT=7860
HF_HUB_ENABLE_HF_TRANSFER=1
NUM_GPUS=2

I wonder if these are related?

But latest official Docker image should have that change:

https://github.com/PygmalionAI/aphrodite-engine/pull/388

So maybe not related. I tried setting UID environment variable to 0 and 1000, and I tried --user=root as additional Docker run arg, but I get the same error:

Click for full error logs

``` 2024-05-29T11:22:30.471964965Z [36m(RayWorkerAphrodite pid=2015)[0m INFO: Model weights loaded. Memory usage: 21.13 GiB x 2 = 42.27 GiB 2024-05-29T11:22:30.472028452Z [36m(RayWorkerAphrodite pid=2015)[0m ERROR: Error executing method determine_num_available_blocks. This might 2024-05-29T11:22:30.472039068Z [36m(RayWorkerAphrodite pid=2015)[0m cause deadlock in distributed execution. 2024-05-29T11:22:35.202441059Z Starting Aphrodite Engine API server... 2024-05-29T11:22:35.202724339Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --tensor-parallel-size 2 --model turboderp/command-r-plus-103B-exl2 --revision 3.0bpw --tokenizer-revision 3.0bpw --quantization exl2 --max-model-len 4096 --kv-cache-dtype fp8 --dtype float16 --enforce-eager true 2024-05-29T11:22:38.379034547Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 2024-05-29T11:22:38.379082110Z warnings.warn( 2024-05-29T11:22:38.869674852Z WARNING: exl2 quantization is not fully optimized yet. The speed can be slower 2024-05-29T11:22:38.869720110Z than non-quantized models. 2024-05-29T11:22:38.875878939Z INFO: Using fp8 data type to store kv cache. It reduces the GPU memory 2024-05-29T11:22:38.875956953Z footprint and boosts the performance. But it may cause slight accuracy drop 2024-05-29T11:22:38.875963727Z without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda 2024-05-29T11:22:38.875971340Z version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for 2024-05-29T11:22:38.875978534Z common inference criteria. 2024-05-29T11:22:40.637997316Z 2024-05-29 11:22:40,637 WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set `RAY_USE_MULTIPROCESSING_CPU_COUNT=1` as an env var before starting Ray. Set the env var: `RAY_DISABLE_DOCKER_CPU_WARNING=1` to mute this warning. 2024-05-29T11:22:40.638057450Z 2024-05-29 11:22:40,637 WARNING utils.py:592 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 27.2 to 27. 2024-05-29T11:22:40.838132574Z 2024-05-29 11:22:40,837 INFO worker.py:1749 -- Started a local Ray instance. 2024-05-29T11:22:41.538540361Z INFO: Initializing the Aphrodite Engine (v0.5.3) with the following config: 2024-05-29T11:22:41.538574654Z INFO: Model = 'turboderp/command-r-plus-103B-exl2' 2024-05-29T11:22:41.538581289Z INFO: Speculative Config = None 2024-05-29T11:22:41.538587854Z INFO: DataType = torch.float16 2024-05-29T11:22:41.538593651Z INFO: Model Load Format = auto 2024-05-29T11:22:41.538598889Z INFO: Number of GPUs = 2 2024-05-29T11:22:41.538605873Z INFO: Disable Custom All-Reduce = False 2024-05-29T11:22:41.538611530Z INFO: Quantization Format = exl2 2024-05-29T11:22:41.538618165Z INFO: Context Length = 4096 2024-05-29T11:22:41.538624451Z INFO: Enforce Eager Mode = True 2024-05-29T11:22:41.538629689Z INFO: KV Cache Data Type = fp8 2024-05-29T11:22:41.538635486Z INFO: KV Cache Params Path = None 2024-05-29T11:22:41.538640655Z INFO: Device = cuda 2024-05-29T11:22:41.538646312Z INFO: Guided Decoding Backend = 2024-05-29T11:22:41.538651550Z DecodingConfig(guided_decoding_backend='outlines') 2024-05-29T11:22:43.606442894Z Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-05-29T11:22:43.651450354Z WARNING: The tokenizer's vocabulary size 255029 does not match the model's 2024-05-29T11:22:43.651473263Z vocabulary size 256000. 2024-05-29T11:22:43.651841192Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 2024-05-29T11:22:43.651858233Z warnings.warn( 2024-05-29T11:22:48.376555440Z INFO: Using FlashAttention backend. 2024-05-29T11:22:49.036469322Z [36m(RayWorkerAphrodite pid=2017)[0m INFO: Using FlashAttention backend. 2024-05-29T11:22:49.036528059Z INFO: Aphrodite is using nccl==2.20.5 2024-05-29T11:22:49.301692524Z [36m(RayWorkerAphrodite pid=2017)[0m INFO: Aphrodite is using nccl==2.20.5 2024-05-29T11:22:49.301739178Z INFO: NVLink detection failed with message "Not Supported". This is normal 2024-05-29T11:22:49.301746442Z if your machine has no NVLink equipped 2024-05-29T11:22:49.303372509Z INFO: reading GPU P2P access cache from 2024-05-29T11:22:49.303426357Z /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json 2024-05-29T11:22:49.305018272Z WARNING: Custom allreduce is disabled because your platform lacks GPU P2P 2024-05-29T11:22:49.305036081Z capability or P2P test failed. To silence this warning, specify 2024-05-29T11:22:49.305054939Z disable_custom_all_reduce=True explicitly. 2024-05-29T11:22:49.788418031Z [36m(RayWorkerAphrodite pid=2017)[0m INFO: NVLink detection failed with message "Not Supported". This is normal 2024-05-29T11:22:49.788469016Z [36m(RayWorkerAphrodite pid=2017)[0m if your machine has no NVLink equipped 2024-05-29T11:22:49.788473835Z [36m(RayWorkerAphrodite pid=2017)[0m INFO: reading GPU P2P access cache from 2024-05-29T11:22:49.788497582Z [36m(RayWorkerAphrodite pid=2017)[0m /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json 2024-05-29T11:22:49.788503378Z [36m(RayWorkerAphrodite pid=2017)[0m WARNING: Custom allreduce is disabled because your platform lacks GPU P2P 2024-05-29T11:22:49.788509106Z [36m(RayWorkerAphrodite pid=2017)[0m capability or P2P test failed. To silence this warning, specify 2024-05-29T11:22:49.788512947Z [36m(RayWorkerAphrodite pid=2017)[0m disable_custom_all_reduce=True explicitly. 2024-05-29T11:22:52.014061650Z [36m(RayWorkerAphrodite pid=2017)[0m INFO: Using model weights format ['*.safetensors'] 2024-05-29T11:22:52.014114171Z INFO: Using model weights format ['*.safetensors'] 2024-05-29T11:23:03.570937289Z INFO: Model weights loaded. Memory usage: 21.14 GiB x 2 = 42.27 GiB 2024-05-29T11:23:16.027270275Z [rank0]: Traceback (most recent call last): 2024-05-29T11:23:16.027320771Z [rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2024-05-29T11:23:16.027329850Z [rank0]: return _run_code(code, main_globals, None, 2024-05-29T11:23:16.027337114Z [rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code 2024-05-29T11:23:16.027344238Z [rank0]: exec(code, run_globals) 2024-05-29T11:23:16.027351292Z [rank0]: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in 2024-05-29T11:23:16.027358905Z [rank0]: run_server(args) 2024-05-29T11:23:16.027365679Z [rank0]: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server 2024-05-29T11:23:16.027372314Z [rank0]: engine = AsyncAphrodite.from_engine_args(engine_args) 2024-05-29T11:23:16.027379508Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in from_engine_args 2024-05-29T11:23:16.027386562Z [rank0]: engine = cls(engine_config.parallel_config.worker_use_ray, 2024-05-29T11:23:16.027393826Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 323, in __init__ 2024-05-29T11:23:16.027400950Z [rank0]: self.engine = self._init_engine(*args, **kwargs) 2024-05-29T11:23:16.027408074Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 429, in _init_engine 2024-05-29T11:23:16.027417083Z [rank0]: return engine_class(*args, **kwargs) 2024-05-29T11:23:16.027424277Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 142, in __init__ 2024-05-29T11:23:16.027431541Z [rank0]: self._initialize_kv_caches() 2024-05-29T11:23:16.027438595Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 182, in _initialize_kv_caches 2024-05-29T11:23:16.027445230Z [rank0]: self.model_executor.determine_num_available_blocks()) 2024-05-29T11:23:16.027452423Z [rank0]: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 208, in determine_num_available_blocks 2024-05-29T11:23:16.027459687Z [rank0]: num_blocks = self._run_workers("determine_num_available_blocks", ) 2024-05-29T11:23:16.027464925Z [rank0]: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 309, in _run_workers 2024-05-29T11:23:16.027471909Z [rank0]: driver_worker_output = getattr(self.driver_worker, 2024-05-29T11:23:16.027479033Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027486297Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027494957Z [rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/worker.py", line 144, in determine_num_available_blocks 2024-05-29T11:23:16.027502011Z [rank0]: self.model_runner.profile_run() 2024-05-29T11:23:16.027509066Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027516120Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027522894Z [rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 948, in profile_run 2024-05-29T11:23:16.027530018Z [rank0]: self.execute_model(seqs, kv_caches) 2024-05-29T11:23:16.027546431Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027567035Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027573600Z [rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 868, in execute_model 2024-05-29T11:23:16.027580305Z [rank0]: hidden_states = model_executable(**execute_model_kwargs) 2024-05-29T11:23:16.027587568Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027594692Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027601816Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027608870Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027616134Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027622839Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027632337Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 390, in forward 2024-05-29T11:23:16.027639391Z [rank0]: hidden_states = self.model(input_ids, positions, kv_caches, 2024-05-29T11:23:16.027646096Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027653360Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027660414Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027668515Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027675639Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 349, in forward 2024-05-29T11:23:16.027682484Z [rank0]: hidden_states, residual = layer( 2024-05-29T11:23:16.027689608Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027696662Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027703367Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027710491Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027717265Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 305, in forward 2024-05-29T11:23:16.027724389Z [rank0]: hidden_states, residual = self.input_layernorm(hidden_states, residual) 2024-05-29T11:23:16.027731443Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027738637Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027745412Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027753094Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027759590Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 82, in forward 2024-05-29T11:23:16.027766644Z [rank0]: hidden_states = layer_norm_func(hidden_states, self.weight, 2024-05-29T11:23:16.027773768Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn 2024-05-29T11:23:16.027780473Z [rank0]: return fn(*args, **kwargs) 2024-05-29T11:23:16.027787596Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors 2024-05-29T11:23:16.027794581Z [rank0]: return callback(frame, cache_entry, hooks, frame_state, skip=1) 2024-05-29T11:23:16.027801705Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame 2024-05-29T11:23:16.027808409Z [rank0]: result = inner_convert( 2024-05-29T11:23:16.027815603Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert 2024-05-29T11:23:16.027825102Z [rank0]: return _compile( 2024-05-29T11:23:16.027831248Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.027838372Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.027844727Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile 2024-05-29T11:23:16.027851851Z [rank0]: guarded_code = compile_inner(code, one_graph, hooks, transform) 2024-05-29T11:23:16.027858486Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.027865540Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.027872315Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner 2024-05-29T11:23:16.027879509Z [rank0]: out_code = transform_code_object(code, transform) 2024-05-29T11:23:16.027886633Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object 2024-05-29T11:23:16.027893687Z [rank0]: transformations(instructions, code_options) 2024-05-29T11:23:16.027900881Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn 2024-05-29T11:23:16.027907655Z [rank0]: return fn(*args, **kwargs) 2024-05-29T11:23:16.027914779Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform 2024-05-29T11:23:16.027921903Z [rank0]: tracer.run() 2024-05-29T11:23:16.027928957Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run 2024-05-29T11:23:16.027935732Z [rank0]: super().run() 2024-05-29T11:23:16.027942995Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run 2024-05-29T11:23:16.027949980Z [rank0]: and self.step() 2024-05-29T11:23:16.027957173Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step 2024-05-29T11:23:16.027963808Z [rank0]: getattr(self, inst.opname)(inst) 2024-05-29T11:23:16.027968697Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE 2024-05-29T11:23:16.027975402Z [rank0]: self.output.compile_subgraph( 2024-05-29T11:23:16.027982456Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 971, in compile_subgraph 2024-05-29T11:23:16.027988602Z [rank0]: self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) 2024-05-29T11:23:16.027994818Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028001593Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028008298Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1168, in compile_and_call_fx_graph 2024-05-29T11:23:16.028015771Z [rank0]: compiled_fn = self.call_user_compiler(gm) 2024-05-29T11:23:16.028022895Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028029949Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028037143Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1241, in call_user_compiler 2024-05-29T11:23:16.028046083Z [rank0]: raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( 2024-05-29T11:23:16.028052648Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1222, in call_user_compiler 2024-05-29T11:23:16.028061657Z [rank0]: compiled_fn = compiler_fn(gm, self.example_inputs()) 2024-05-29T11:23:16.028068921Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper 2024-05-29T11:23:16.028075626Z [rank0]: compiled_gm = compiler_fn(gm, example_inputs) 2024-05-29T11:23:16.028082750Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1729, in __call__ 2024-05-29T11:23:16.028089315Z [rank0]: return compile_fx(model_, inputs_, config_patches=self.config) 2024-05-29T11:23:16.028096020Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028101887Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028108941Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx 2024-05-29T11:23:16.028115645Z [rank0]: return aot_autograd( 2024-05-29T11:23:16.028122211Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn 2024-05-29T11:23:16.028128985Z [rank0]: cg = aot_module_simplified(gm, example_inputs, **kwargs) 2024-05-29T11:23:16.028136179Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified 2024-05-29T11:23:16.028143303Z [rank0]: compiled_fn = create_aot_dispatcher_function( 2024-05-29T11:23:16.028150357Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028157062Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028163837Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function 2024-05-29T11:23:16.028170961Z [rank0]: compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata) 2024-05-29T11:23:16.028178084Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe 2024-05-29T11:23:16.028185208Z [rank0]: return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata) 2024-05-29T11:23:16.028191913Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base 2024-05-29T11:23:16.028200085Z [rank0]: return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata) 2024-05-29T11:23:16.028206650Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base 2024-05-29T11:23:16.028213355Z [rank0]: compiled_fw = compiler(fw_module, updated_flat_args) 2024-05-29T11:23:16.028220479Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028227812Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028234377Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base 2024-05-29T11:23:16.028241920Z [rank0]: return inner_compile( 2024-05-29T11:23:16.028248625Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper 2024-05-29T11:23:16.028255400Z [rank0]: inner_compiled_fn = compiler_fn(gm, example_inputs) 2024-05-29T11:23:16.028262594Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 304, in inner 2024-05-29T11:23:16.028269648Z [rank0]: return fn(*args, **kwargs) 2024-05-29T11:23:16.028276772Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028283965Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028291229Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028297375Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028304359Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028311483Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028318607Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner 2024-05-29T11:23:16.028325312Z [rank0]: compiled_graph = fx_codegen_and_compile( 2024-05-29T11:23:16.028332575Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile 2024-05-29T11:23:16.028339630Z [rank0]: compiled_fn = graph.compile_to_fn() 2024-05-29T11:23:16.028348639Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn 2024-05-29T11:23:16.028355833Z [rank0]: return self.compile_to_module().call 2024-05-29T11:23:16.028363027Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028370151Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028376786Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1250, in compile_to_module 2024-05-29T11:23:16.028383979Z [rank0]: self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() 2024-05-29T11:23:16.028391173Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1208, in codegen 2024-05-29T11:23:16.028398297Z [rank0]: self.scheduler.codegen() 2024-05-29T11:23:16.028404932Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028412056Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028419320Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/scheduler.py", line 2339, in codegen 2024-05-29T11:23:16.028425955Z [rank0]: self.get_backend(device).codegen_nodes(node.get_nodes()) # type: ignore[possibly-undefined] 2024-05-29T11:23:16.028433078Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 63, in codegen_nodes 2024-05-29T11:23:16.028440202Z [rank0]: return self._triton_scheduling.codegen_nodes(nodes) 2024-05-29T11:23:16.028446907Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3255, in codegen_nodes 2024-05-29T11:23:16.028454171Z [rank0]: return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel) 2024-05-29T11:23:16.028460736Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3427, in codegen_node_schedule 2024-05-29T11:23:16.028467930Z [rank0]: kernel_name = self.define_kernel(src_code, node_schedule) 2024-05-29T11:23:16.028474984Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3537, in define_kernel 2024-05-29T11:23:16.028481758Z [rank0]: basename, _, kernel_path = get_path(code_hash(src_code.strip()), "py") 2024-05-29T11:23:16.028488952Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codecache.py", line 349, in get_path 2024-05-29T11:23:16.028495029Z [rank0]: subdir = os.path.join(cache_dir(), basename[1:3]) 2024-05-29T11:23:16.028501733Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/utils.py", line 739, in cache_dir 2024-05-29T11:23:16.028508438Z [rank0]: sanitized_username = re.sub(r'[\\/:*?"<>|]', "_", getpass.getuser()) 2024-05-29T11:23:16.028518076Z [rank0]: File "/usr/lib/python3.10/getpass.py", line 169, in getuser 2024-05-29T11:23:16.028524711Z [rank0]: return pwd.getpwuid(os.getuid())[0] 2024-05-29T11:23:16.028530299Z [rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: 2024-05-29T11:23:16.028537004Z [rank0]: KeyError: 'getpwuid(): uid not found: 1000' 2024-05-29T11:23:16.028544267Z 2024-05-29T11:23:16.028550483Z [rank0]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information 2024-05-29T11:23:16.028557537Z 2024-05-29T11:23:16.028564103Z 2024-05-29T11:23:16.028571226Z [rank0]: You can suppress this exception and fall back to eager by setting: 2024-05-29T11:23:16.028577931Z [rank0]: import torch._dynamo 2024-05-29T11:23:16.028585055Z [rank0]: torch._dynamo.config.suppress_errors = True ```

PygmalionAI / aphrodite-engine

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

Your current environment

🐛 Describe the bug