PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.09k stars 120 forks source link

[Bug]: torch._dynamo.exc.BackendCompilerFailed with command-r-plus #472

Closed heungson closed 1 month ago

heungson commented 5 months ago

Your current environment

aphrodite docker container

Setting 1 GPUs: RTX8000 * 2 model: alpindale/c4ai-command-r-plus-GPTQ Quantization: gptq

Setting 2 GPUs: A6000 ada * 4 model: CohereForAI/c4ai-command-r-plus Quantization: load-in-smooth

🐛 Describe the bug

Starting Aphrodite Engine API server...

rank0: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

rank0: You can suppress this exception and fall back to eager by setting: rank0: import torch._dynamo rank0: torch._dynamo.config.suppress_errors = True

(RayWorkerAphrodite pid=1127) INFO: Model weights loaded. Memory usage: 27.78 GiB x 2 = 55.55 GiB (RayWorkerAphrodite pid=1127) ERROR: Error executing method determine_num_available_blocks. This might (RayWorkerAphrodite pid=1127) cause deadlock in distributed execution. [W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]


This is the log generated with gptq version. The same errors are raised when running with non quantized version of the model. gptq version works fine on vllm.

josephrocca commented 5 months ago

Also getting this error for turboderp/command-r-plus-103B-exl2 on 2x4090s on Runpod (EDIT: and also Dracones/c4ai-command-r-v01_exl2_3.0bpw on 1x4090) with latest official Aphrodite Docker image as of writing:

alpindale/aphrodite-engine@sha256:b1e72201654a172e044a13d9346264a8b4e562dba8f3572bd92f013cf5420eb1
CMD_ADDITIONAL_ARGUMENTS="--model turboderp/command-r-plus-103B-exl2 --revision 3.0bpw --tokenizer-revision 3.0bpw --quantization exl2 --max-model-len 4096 --kv-cache-dtype fp8 --dtype float16 --enforce-eager true"
PORT=7860
HF_HUB_ENABLE_HF_TRANSFER=1
NUM_GPUS=2

I wonder if these are related?

But latest official Docker image should have that change:

So maybe not related. I tried setting UID environment variable to 0 and 1000, and I tried --user=root as additional Docker run arg, but I get the same error:

Click for full error logs ``` 2024-05-29T11:22:30.471964965Z (RayWorkerAphrodite pid=2015) INFO: Model weights loaded. Memory usage: 21.13 GiB x 2 = 42.27 GiB 2024-05-29T11:22:30.472028452Z (RayWorkerAphrodite pid=2015) ERROR: Error executing method determine_num_available_blocks. This might 2024-05-29T11:22:30.472039068Z (RayWorkerAphrodite pid=2015) cause deadlock in distributed execution. 2024-05-29T11:22:35.202441059Z Starting Aphrodite Engine API server... 2024-05-29T11:22:35.202724339Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --tensor-parallel-size 2 --model turboderp/command-r-plus-103B-exl2 --revision 3.0bpw --tokenizer-revision 3.0bpw --quantization exl2 --max-model-len 4096 --kv-cache-dtype fp8 --dtype float16 --enforce-eager true 2024-05-29T11:22:38.379034547Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 2024-05-29T11:22:38.379082110Z warnings.warn( 2024-05-29T11:22:38.869674852Z WARNING: exl2 quantization is not fully optimized yet. The speed can be slower 2024-05-29T11:22:38.869720110Z than non-quantized models. 2024-05-29T11:22:38.875878939Z INFO: Using fp8 data type to store kv cache. It reduces the GPU memory 2024-05-29T11:22:38.875956953Z footprint and boosts the performance. But it may cause slight accuracy drop 2024-05-29T11:22:38.875963727Z without scaling factors. FP8_E5M2 (without scaling) is only supported on cuda 2024-05-29T11:22:38.875971340Z version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for 2024-05-29T11:22:38.875978534Z common inference criteria. 2024-05-29T11:22:40.637997316Z 2024-05-29 11:22:40,637 WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set `RAY_USE_MULTIPROCESSING_CPU_COUNT=1` as an env var before starting Ray. Set the env var: `RAY_DISABLE_DOCKER_CPU_WARNING=1` to mute this warning. 2024-05-29T11:22:40.638057450Z 2024-05-29 11:22:40,637 WARNING utils.py:592 -- Ray currently does not support initializing Ray with fractional cpus. Your num_cpus will be truncated from 27.2 to 27. 2024-05-29T11:22:40.838132574Z 2024-05-29 11:22:40,837 INFO worker.py:1749 -- Started a local Ray instance. 2024-05-29T11:22:41.538540361Z INFO: Initializing the Aphrodite Engine (v0.5.3) with the following config: 2024-05-29T11:22:41.538574654Z INFO: Model = 'turboderp/command-r-plus-103B-exl2' 2024-05-29T11:22:41.538581289Z INFO: Speculative Config = None 2024-05-29T11:22:41.538587854Z INFO: DataType = torch.float16 2024-05-29T11:22:41.538593651Z INFO: Model Load Format = auto 2024-05-29T11:22:41.538598889Z INFO: Number of GPUs = 2 2024-05-29T11:22:41.538605873Z INFO: Disable Custom All-Reduce = False 2024-05-29T11:22:41.538611530Z INFO: Quantization Format = exl2 2024-05-29T11:22:41.538618165Z INFO: Context Length = 4096 2024-05-29T11:22:41.538624451Z INFO: Enforce Eager Mode = True 2024-05-29T11:22:41.538629689Z INFO: KV Cache Data Type = fp8 2024-05-29T11:22:41.538635486Z INFO: KV Cache Params Path = None 2024-05-29T11:22:41.538640655Z INFO: Device = cuda 2024-05-29T11:22:41.538646312Z INFO: Guided Decoding Backend = 2024-05-29T11:22:41.538651550Z DecodingConfig(guided_decoding_backend='outlines') 2024-05-29T11:22:43.606442894Z Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-05-29T11:22:43.651450354Z WARNING: The tokenizer's vocabulary size 255029 does not match the model's 2024-05-29T11:22:43.651473263Z vocabulary size 256000. 2024-05-29T11:22:43.651841192Z /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 2024-05-29T11:22:43.651858233Z warnings.warn( 2024-05-29T11:22:48.376555440Z INFO: Using FlashAttention backend. 2024-05-29T11:22:49.036469322Z (RayWorkerAphrodite pid=2017) INFO: Using FlashAttention backend. 2024-05-29T11:22:49.036528059Z INFO: Aphrodite is using nccl==2.20.5 2024-05-29T11:22:49.301692524Z (RayWorkerAphrodite pid=2017) INFO: Aphrodite is using nccl==2.20.5 2024-05-29T11:22:49.301739178Z INFO: NVLink detection failed with message "Not Supported". This is normal 2024-05-29T11:22:49.301746442Z if your machine has no NVLink equipped 2024-05-29T11:22:49.303372509Z INFO: reading GPU P2P access cache from 2024-05-29T11:22:49.303426357Z /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json 2024-05-29T11:22:49.305018272Z WARNING: Custom allreduce is disabled because your platform lacks GPU P2P 2024-05-29T11:22:49.305036081Z capability or P2P test failed. To silence this warning, specify 2024-05-29T11:22:49.305054939Z disable_custom_all_reduce=True explicitly. 2024-05-29T11:22:49.788418031Z (RayWorkerAphrodite pid=2017) INFO: NVLink detection failed with message "Not Supported". This is normal 2024-05-29T11:22:49.788469016Z (RayWorkerAphrodite pid=2017) if your machine has no NVLink equipped 2024-05-29T11:22:49.788473835Z (RayWorkerAphrodite pid=2017) INFO: reading GPU P2P access cache from 2024-05-29T11:22:49.788497582Z (RayWorkerAphrodite pid=2017) /app/aphrodite-engine/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json 2024-05-29T11:22:49.788503378Z (RayWorkerAphrodite pid=2017) WARNING: Custom allreduce is disabled because your platform lacks GPU P2P 2024-05-29T11:22:49.788509106Z (RayWorkerAphrodite pid=2017) capability or P2P test failed. To silence this warning, specify 2024-05-29T11:22:49.788512947Z (RayWorkerAphrodite pid=2017) disable_custom_all_reduce=True explicitly. 2024-05-29T11:22:52.014061650Z (RayWorkerAphrodite pid=2017) INFO: Using model weights format ['*.safetensors'] 2024-05-29T11:22:52.014114171Z INFO: Using model weights format ['*.safetensors'] 2024-05-29T11:23:03.570937289Z INFO: Model weights loaded. Memory usage: 21.14 GiB x 2 = 42.27 GiB 2024-05-29T11:23:16.027270275Z [rank0]: Traceback (most recent call last): 2024-05-29T11:23:16.027320771Z [rank0]: File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2024-05-29T11:23:16.027329850Z [rank0]: return _run_code(code, main_globals, None, 2024-05-29T11:23:16.027337114Z [rank0]: File "/usr/lib/python3.10/runpy.py", line 86, in _run_code 2024-05-29T11:23:16.027344238Z [rank0]: exec(code, run_globals) 2024-05-29T11:23:16.027351292Z [rank0]: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 562, in 2024-05-29T11:23:16.027358905Z [rank0]: run_server(args) 2024-05-29T11:23:16.027365679Z [rank0]: File "/app/aphrodite-engine/aphrodite/endpoints/openai/api_server.py", line 519, in run_server 2024-05-29T11:23:16.027372314Z [rank0]: engine = AsyncAphrodite.from_engine_args(engine_args) 2024-05-29T11:23:16.027379508Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in from_engine_args 2024-05-29T11:23:16.027386562Z [rank0]: engine = cls(engine_config.parallel_config.worker_use_ray, 2024-05-29T11:23:16.027393826Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 323, in __init__ 2024-05-29T11:23:16.027400950Z [rank0]: self.engine = self._init_engine(*args, **kwargs) 2024-05-29T11:23:16.027408074Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 429, in _init_engine 2024-05-29T11:23:16.027417083Z [rank0]: return engine_class(*args, **kwargs) 2024-05-29T11:23:16.027424277Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 142, in __init__ 2024-05-29T11:23:16.027431541Z [rank0]: self._initialize_kv_caches() 2024-05-29T11:23:16.027438595Z [rank0]: File "/app/aphrodite-engine/aphrodite/engine/aphrodite_engine.py", line 182, in _initialize_kv_caches 2024-05-29T11:23:16.027445230Z [rank0]: self.model_executor.determine_num_available_blocks()) 2024-05-29T11:23:16.027452423Z [rank0]: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 208, in determine_num_available_blocks 2024-05-29T11:23:16.027459687Z [rank0]: num_blocks = self._run_workers("determine_num_available_blocks", ) 2024-05-29T11:23:16.027464925Z [rank0]: File "/app/aphrodite-engine/aphrodite/executor/ray_gpu_executor.py", line 309, in _run_workers 2024-05-29T11:23:16.027471909Z [rank0]: driver_worker_output = getattr(self.driver_worker, 2024-05-29T11:23:16.027479033Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027486297Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027494957Z [rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/worker.py", line 144, in determine_num_available_blocks 2024-05-29T11:23:16.027502011Z [rank0]: self.model_runner.profile_run() 2024-05-29T11:23:16.027509066Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027516120Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027522894Z [rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 948, in profile_run 2024-05-29T11:23:16.027530018Z [rank0]: self.execute_model(seqs, kv_caches) 2024-05-29T11:23:16.027546431Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027567035Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027573600Z [rank0]: File "/app/aphrodite-engine/aphrodite/task_handler/model_runner.py", line 868, in execute_model 2024-05-29T11:23:16.027580305Z [rank0]: hidden_states = model_executable(**execute_model_kwargs) 2024-05-29T11:23:16.027587568Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027594692Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027601816Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027608870Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027616134Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context 2024-05-29T11:23:16.027622839Z [rank0]: return func(*args, **kwargs) 2024-05-29T11:23:16.027632337Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 390, in forward 2024-05-29T11:23:16.027639391Z [rank0]: hidden_states = self.model(input_ids, positions, kv_caches, 2024-05-29T11:23:16.027646096Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027653360Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027660414Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027668515Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027675639Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 349, in forward 2024-05-29T11:23:16.027682484Z [rank0]: hidden_states, residual = layer( 2024-05-29T11:23:16.027689608Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027696662Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027703367Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027710491Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027717265Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 305, in forward 2024-05-29T11:23:16.027724389Z [rank0]: hidden_states, residual = self.input_layernorm(hidden_states, residual) 2024-05-29T11:23:16.027731443Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl 2024-05-29T11:23:16.027738637Z [rank0]: return self._call_impl(*args, **kwargs) 2024-05-29T11:23:16.027745412Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl 2024-05-29T11:23:16.027753094Z [rank0]: return forward_call(*args, **kwargs) 2024-05-29T11:23:16.027759590Z [rank0]: File "/app/aphrodite-engine/aphrodite/modeling/models/cohere.py", line 82, in forward 2024-05-29T11:23:16.027766644Z [rank0]: hidden_states = layer_norm_func(hidden_states, self.weight, 2024-05-29T11:23:16.027773768Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn 2024-05-29T11:23:16.027780473Z [rank0]: return fn(*args, **kwargs) 2024-05-29T11:23:16.027787596Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors 2024-05-29T11:23:16.027794581Z [rank0]: return callback(frame, cache_entry, hooks, frame_state, skip=1) 2024-05-29T11:23:16.027801705Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 786, in _convert_frame 2024-05-29T11:23:16.027808409Z [rank0]: result = inner_convert( 2024-05-29T11:23:16.027815603Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert 2024-05-29T11:23:16.027825102Z [rank0]: return _compile( 2024-05-29T11:23:16.027831248Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.027838372Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.027844727Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 676, in _compile 2024-05-29T11:23:16.027851851Z [rank0]: guarded_code = compile_inner(code, one_graph, hooks, transform) 2024-05-29T11:23:16.027858486Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.027865540Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.027872315Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner 2024-05-29T11:23:16.027879509Z [rank0]: out_code = transform_code_object(code, transform) 2024-05-29T11:23:16.027886633Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object 2024-05-29T11:23:16.027893687Z [rank0]: transformations(instructions, code_options) 2024-05-29T11:23:16.027900881Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 165, in _fn 2024-05-29T11:23:16.027907655Z [rank0]: return fn(*args, **kwargs) 2024-05-29T11:23:16.027914779Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 500, in transform 2024-05-29T11:23:16.027921903Z [rank0]: tracer.run() 2024-05-29T11:23:16.027928957Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run 2024-05-29T11:23:16.027935732Z [rank0]: super().run() 2024-05-29T11:23:16.027942995Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 810, in run 2024-05-29T11:23:16.027949980Z [rank0]: and self.step() 2024-05-29T11:23:16.027957173Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 773, in step 2024-05-29T11:23:16.027963808Z [rank0]: getattr(self, inst.opname)(inst) 2024-05-29T11:23:16.027968697Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE 2024-05-29T11:23:16.027975402Z [rank0]: self.output.compile_subgraph( 2024-05-29T11:23:16.027982456Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 971, in compile_subgraph 2024-05-29T11:23:16.027988602Z [rank0]: self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) 2024-05-29T11:23:16.027994818Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028001593Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028008298Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1168, in compile_and_call_fx_graph 2024-05-29T11:23:16.028015771Z [rank0]: compiled_fn = self.call_user_compiler(gm) 2024-05-29T11:23:16.028022895Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028029949Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028037143Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1241, in call_user_compiler 2024-05-29T11:23:16.028046083Z [rank0]: raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( 2024-05-29T11:23:16.028052648Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 1222, in call_user_compiler 2024-05-29T11:23:16.028061657Z [rank0]: compiled_fn = compiler_fn(gm, self.example_inputs()) 2024-05-29T11:23:16.028068921Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper 2024-05-29T11:23:16.028075626Z [rank0]: compiled_gm = compiler_fn(gm, example_inputs) 2024-05-29T11:23:16.028082750Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1729, in __call__ 2024-05-29T11:23:16.028089315Z [rank0]: return compile_fx(model_, inputs_, config_patches=self.config) 2024-05-29T11:23:16.028096020Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028101887Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028108941Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx 2024-05-29T11:23:16.028115645Z [rank0]: return aot_autograd( 2024-05-29T11:23:16.028122211Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn 2024-05-29T11:23:16.028128985Z [rank0]: cg = aot_module_simplified(gm, example_inputs, **kwargs) 2024-05-29T11:23:16.028136179Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified 2024-05-29T11:23:16.028143303Z [rank0]: compiled_fn = create_aot_dispatcher_function( 2024-05-29T11:23:16.028150357Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028157062Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028163837Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function 2024-05-29T11:23:16.028170961Z [rank0]: compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata) 2024-05-29T11:23:16.028178084Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe 2024-05-29T11:23:16.028185208Z [rank0]: return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata) 2024-05-29T11:23:16.028191913Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base 2024-05-29T11:23:16.028200085Z [rank0]: return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata) 2024-05-29T11:23:16.028206650Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base 2024-05-29T11:23:16.028213355Z [rank0]: compiled_fw = compiler(fw_module, updated_flat_args) 2024-05-29T11:23:16.028220479Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028227812Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028234377Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base 2024-05-29T11:23:16.028241920Z [rank0]: return inner_compile( 2024-05-29T11:23:16.028248625Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper 2024-05-29T11:23:16.028255400Z [rank0]: inner_compiled_fn = compiler_fn(gm, example_inputs) 2024-05-29T11:23:16.028262594Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/debug.py", line 304, in inner 2024-05-29T11:23:16.028269648Z [rank0]: return fn(*args, **kwargs) 2024-05-29T11:23:16.028276772Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028283965Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028291229Z [rank0]: File "/usr/lib/python3.10/contextlib.py", line 79, in inner 2024-05-29T11:23:16.028297375Z [rank0]: return func(*args, **kwds) 2024-05-29T11:23:16.028304359Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028311483Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028318607Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner 2024-05-29T11:23:16.028325312Z [rank0]: compiled_graph = fx_codegen_and_compile( 2024-05-29T11:23:16.028332575Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile 2024-05-29T11:23:16.028339630Z [rank0]: compiled_fn = graph.compile_to_fn() 2024-05-29T11:23:16.028348639Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn 2024-05-29T11:23:16.028355833Z [rank0]: return self.compile_to_module().call 2024-05-29T11:23:16.028363027Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028370151Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028376786Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1250, in compile_to_module 2024-05-29T11:23:16.028383979Z [rank0]: self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() 2024-05-29T11:23:16.028391173Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py", line 1208, in codegen 2024-05-29T11:23:16.028398297Z [rank0]: self.scheduler.codegen() 2024-05-29T11:23:16.028404932Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 262, in time_wrapper 2024-05-29T11:23:16.028412056Z [rank0]: r = func(*args, **kwargs) 2024-05-29T11:23:16.028419320Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/scheduler.py", line 2339, in codegen 2024-05-29T11:23:16.028425955Z [rank0]: self.get_backend(device).codegen_nodes(node.get_nodes()) # type: ignore[possibly-undefined] 2024-05-29T11:23:16.028433078Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 63, in codegen_nodes 2024-05-29T11:23:16.028440202Z [rank0]: return self._triton_scheduling.codegen_nodes(nodes) 2024-05-29T11:23:16.028446907Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3255, in codegen_nodes 2024-05-29T11:23:16.028454171Z [rank0]: return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel) 2024-05-29T11:23:16.028460736Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3427, in codegen_node_schedule 2024-05-29T11:23:16.028467930Z [rank0]: kernel_name = self.define_kernel(src_code, node_schedule) 2024-05-29T11:23:16.028474984Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codegen/triton.py", line 3537, in define_kernel 2024-05-29T11:23:16.028481758Z [rank0]: basename, _, kernel_path = get_path(code_hash(src_code.strip()), "py") 2024-05-29T11:23:16.028488952Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/codecache.py", line 349, in get_path 2024-05-29T11:23:16.028495029Z [rank0]: subdir = os.path.join(cache_dir(), basename[1:3]) 2024-05-29T11:23:16.028501733Z [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/utils.py", line 739, in cache_dir 2024-05-29T11:23:16.028508438Z [rank0]: sanitized_username = re.sub(r'[\\/:*?"<>|]', "_", getpass.getuser()) 2024-05-29T11:23:16.028518076Z [rank0]: File "/usr/lib/python3.10/getpass.py", line 169, in getuser 2024-05-29T11:23:16.028524711Z [rank0]: return pwd.getpwuid(os.getuid())[0] 2024-05-29T11:23:16.028530299Z [rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: 2024-05-29T11:23:16.028537004Z [rank0]: KeyError: 'getpwuid(): uid not found: 1000' 2024-05-29T11:23:16.028544267Z 2024-05-29T11:23:16.028550483Z [rank0]: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information 2024-05-29T11:23:16.028557537Z 2024-05-29T11:23:16.028564103Z 2024-05-29T11:23:16.028571226Z [rank0]: You can suppress this exception and fall back to eager by setting: 2024-05-29T11:23:16.028577931Z [rank0]: import torch._dynamo 2024-05-29T11:23:16.028585055Z [rank0]: torch._dynamo.config.suppress_errors = True ```
josephrocca commented 5 months ago

@AlpinDale Please ignore if this issue is a wontfix (and please forgive this ping in that case :pray:) -- just in case this slipped through the cracks: I can reproduce OP's issue. See my above comment for reproduction details + logs. The TL;DR is that command-r-plus doesn't seem to work with a basic Aphrodite setup (e.g. exl2 weights, Runpod w/ official docker image, as above).

Edit: I can also reproduce with Dracones/c4ai-command-r-v01_exl2_3.0bpw (i.e. issue seems to occur with both command-r and command-r-plus)

AlpinDale commented 5 months ago

I'll get to investigating this soon; I've been busy with other projects so I haven't had much time to work on aphrodite lately. I have an inkling that this is related to torch.compile().