OpenBMB / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
10 stars 5 forks source link

[Installation]: Building wheel for vllm (pyproject.toml) did not run successfully #13

Open DiaQusNet opened 1 month ago

DiaQusNet commented 1 month ago

Your current environment

PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.23.0
Libc version: glibc-2.31

Python version: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.4.0
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.4.0
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.4.0
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.4.0
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.4.0
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.4.0
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
架构:                                x86_64
CPU 运行模式:                        32-bit, 64-bit
字节序:                              Little Endian
Address sizes:                        39 bits physical, 48 bits virtual
CPU:                                  24
在线 CPU 列表:                       0-23
每个核的线程数:                      1
每个座的核数:                        16
座:                                  1
NUMA 节点:                           1
厂商 ID:                             GenuineIntel
CPU 系列:                            6
型号:                                151
型号名称:                            12th Gen Intel(R) Core(TM) i9-12900KF
步进:                                2
CPU MHz:                             3200.000
CPU 最大 MHz:                        5200.0000
CPU 最小 MHz:                        800.0000
BogoMIPS:                            6374.40
虚拟化:                              VT-x
L1d 缓存:                            384 KiB
L1i 缓存:                            256 KiB
L2 缓存:                             10 MiB
NUMA 节点0 CPU:                      0-23
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Mitigation; Clear Register File
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected
标记:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0
[pip3] torchaudio==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
[conda] nvidia-ml-py              12.560.30                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.6.68                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
[conda] pyzmq                     26.2.0                   pypi_0    pypi
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchaudio                2.4.0                    pypi_0    pypi
[conda] torchvision               0.19.0                   pypi_0    pypi
[conda] transformers              4.44.2                   pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     0-23    0               N/A
GPU1    NV4      X      0-23    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How you are installing vllm

pip install git+https://github.com/OpenBMB/vllm.git@minicpm3

Before submitting a new issue...

DiaQusNet commented 1 month ago

termial shows below

Building wheels for collected packages: vllm
  Building wheel for vllm (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for vllm (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [587 lines of output]
      /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/config.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/block.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/scalar_type.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/version.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/envs.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/commit_id.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/logger.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/connections.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/utils.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-311/vllm
      copying vllm/_core_ext.py -> build/lib.linux-x86_64-cpython-311/vllm
      creating build/lib.linux-x86_64-cpython-311/vllm/transformers_utils
      copying vllm/transformers_utils/detokenizer.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils
      copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils
      copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils
      copying vllm/transformers_utils/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils
      copying vllm/transformers_utils/image_processor.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils
      creating build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/fully_sharded_layers.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      copying vllm/lora/punica.py -> build/lib.linux-x86_64-cpython-311/vllm/lora
      creating build/lib.linux-x86_64-cpython-311/vllm/prompt_adapter
      copying vllm/prompt_adapter/layers.py -> build/lib.linux-x86_64-cpython-311/vllm/prompt_adapter
      copying vllm/prompt_adapter/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/prompt_adapter
      copying vllm/prompt_adapter/worker_manager.py -> build/lib.linux-x86_64-cpython-311/vllm/prompt_adapter
      copying vllm/prompt_adapter/request.py -> build/lib.linux-x86_64-cpython-311/vllm/prompt_adapter
      copying vllm/prompt_adapter/models.py -> build/lib.linux-x86_64-cpython-311/vllm/prompt_adapter
      creating build/lib.linux-x86_64-cpython-311/vllm/multimodal
      copying vllm/multimodal/base.py -> build/lib.linux-x86_64-cpython-311/vllm/multimodal
      copying vllm/multimodal/image.py -> build/lib.linux-x86_64-cpython-311/vllm/multimodal
      copying vllm/multimodal/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/multimodal
      copying vllm/multimodal/registry.py -> build/lib.linux-x86_64-cpython-311/vllm/multimodal
      copying vllm/multimodal/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/multimodal
      creating build/lib.linux-x86_64-cpython-311/vllm/attention
      copying vllm/attention/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/attention
      copying vllm/attention/layer.py -> build/lib.linux-x86_64-cpython-311/vllm/attention
      copying vllm/attention/selector.py -> build/lib.linux-x86_64-cpython-311/vllm/attention
      creating build/lib.linux-x86_64-cpython-311/vllm/distributed
      copying vllm/distributed/communication_op.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed
      copying vllm/distributed/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed
      copying vllm/distributed/parallel_state.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed
      copying vllm/distributed/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed
      creating build/lib.linux-x86_64-cpython-311/vllm/usage
      copying vllm/usage/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/usage
      copying vllm/usage/usage_lib.py -> build/lib.linux-x86_64-cpython-311/vllm/usage
      creating build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/gpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/openvino_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/multiproc_gpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/multiproc_worker_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/executor_base.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/tpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/neuron_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/xpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/cpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/ray_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/ray_gpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/distributed_gpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/ray_tpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      copying vllm/executor/ray_xpu_executor.py -> build/lib.linux-x86_64-cpython-311/vllm/executor
      creating build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/target_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/spec_decode_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/util.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/smaller_tp_proposer_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/mlp_speculator_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/metrics.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/multi_step_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/interfaces.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/top1_proposer.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/draft_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/medusa_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/batch_expansion.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/proposer_worker_base.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      copying vllm/spec_decode/ngram_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/spec_decode
      creating build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      copying vllm/adapter_commons/layers.py -> build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      copying vllm/adapter_commons/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      copying vllm/adapter_commons/worker_manager.py -> build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      copying vllm/adapter_commons/request.py -> build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      copying vllm/adapter_commons/models.py -> build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      copying vllm/adapter_commons/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/adapter_commons
      creating build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      copying vllm/entrypoints/launcher.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      copying vllm/entrypoints/logger.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      copying vllm/entrypoints/chat_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor
      copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor
      copying vllm/model_executor/custom_op.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor
      copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor
      copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor
      copying vllm/model_executor/pooling_metadata.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor
      creating build/lib.linux-x86_64-cpython-311/vllm/assets
      copying vllm/assets/base.py -> build/lib.linux-x86_64-cpython-311/vllm/assets
      copying vllm/assets/image.py -> build/lib.linux-x86_64-cpython-311/vllm/assets
      copying vllm/assets/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/assets
      creating build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/block_manager_v1.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/embedding_model_block_manager.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/interfaces.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/evictor_v2.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/evictor_v1.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      copying vllm/core/block_manager_v2.py -> build/lib.linux-x86_64-cpython-311/vllm/core
      creating build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/protocol.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/async_timeout.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-311/vllm/engine
      creating build/lib.linux-x86_64-cpython-311/vllm/inputs
      copying vllm/inputs/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/inputs
      copying vllm/inputs/registry.py -> build/lib.linux-x86_64-cpython-311/vllm/inputs
      copying vllm/inputs/data.py -> build/lib.linux-x86_64-cpython-311/vllm/inputs
      creating build/lib.linux-x86_64-cpython-311/vllm/platforms
      copying vllm/platforms/cuda.py -> build/lib.linux-x86_64-cpython-311/vllm/platforms
      copying vllm/platforms/interface.py -> build/lib.linux-x86_64-cpython-311/vllm/platforms
      copying vllm/platforms/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/platforms
      copying vllm/platforms/tpu.py -> build/lib.linux-x86_64-cpython-311/vllm/platforms
      copying vllm/platforms/rocm.py -> build/lib.linux-x86_64-cpython-311/vllm/platforms
      creating build/lib.linux-x86_64-cpython-311/vllm/triton_utils
      copying vllm/triton_utils/importing.py -> build/lib.linux-x86_64-cpython-311/vllm/triton_utils
      copying vllm/triton_utils/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/triton_utils
      copying vllm/triton_utils/libentry.py -> build/lib.linux-x86_64-cpython-311/vllm/triton_utils
      copying vllm/triton_utils/sample.py -> build/lib.linux-x86_64-cpython-311/vllm/triton_utils
      copying vllm/triton_utils/custom_cache_manager.py -> build/lib.linux-x86_64-cpython-311/vllm/triton_utils
      creating build/lib.linux-x86_64-cpython-311/vllm/logging
      copying vllm/logging/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/logging
      copying vllm/logging/formatter.py -> build/lib.linux-x86_64-cpython-311/vllm/logging
      creating build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/tpu_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/openvino_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/cpu_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/neuron_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/model_runner_base.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/xpu_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/embedding_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/xpu_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/neuron_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/worker_base.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/cpu_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/tpu_model_runner.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      copying vllm/worker/openvino_worker.py -> build/lib.linux-x86_64-cpython-311/vllm/worker
      creating build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/dbrx.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/internvl.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/jais.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/mlp_speculator.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/medusa.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/nemotron.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/mpt.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      copying vllm/transformers_utils/configs/arctic.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/configs
      creating build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizers
      copying vllm/transformers_utils/tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizers
      copying vllm/transformers_utils/tokenizers/baichuan.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizers
      creating build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizer_group
      copying vllm/transformers_utils/tokenizer_group/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizer_group
      copying vllm/transformers_utils/tokenizer_group/ray_tokenizer_group.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizer_group
      copying vllm/transformers_utils/tokenizer_group/base_tokenizer_group.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizer_group
      copying vllm/transformers_utils/tokenizer_group/tokenizer_group.py -> build/lib.linux-x86_64-cpython-311/vllm/transformers_utils/tokenizer_group
      creating build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/bgmv_expand_slice.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/bgmv_expand.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/bgmv_shrink.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/sgmv_expand_slice.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/sgmv_shrink.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/sgmv_expand.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      copying vllm/lora/ops/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/lora/ops
      creating build/lib.linux-x86_64-cpython-311/vllm/attention/ops
      copying vllm/attention/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops
      copying vllm/attention/ops/ipex_attn.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops
      copying vllm/attention/ops/paged_attn.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops
      copying vllm/attention/ops/prefix_prefill.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops
      copying vllm/attention/ops/triton_flash_attention.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops
      creating build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/torch_sdpa.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/flashinfer.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/abstract.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/blocksparse_attn.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/ipex_attn.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/rocm_flash_attn.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/pallas.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      copying vllm/attention/backends/openvino.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/backends
      creating build/lib.linux-x86_64-cpython-311/vllm/attention/ops/blocksparse_attention
      copying vllm/attention/ops/blocksparse_attention/interface.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops/blocksparse_attention
      copying vllm/attention/ops/blocksparse_attention/blocksparse_attention_kernel.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops/blocksparse_attention
      copying vllm/attention/ops/blocksparse_attention/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops/blocksparse_attention
      copying vllm/attention/ops/blocksparse_attention/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/attention/ops/blocksparse_attention
      creating build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/tpu_communicator.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/shm_broadcast.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/custom_all_reduce_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/pynccl_wrapper.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/cuda_wrapper.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      copying vllm/distributed/device_communicators/pynccl.py -> build/lib.linux-x86_64-cpython-311/vllm/distributed/device_communicators
      creating build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/cli_args.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/serving_tokenization.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/serving_embedding.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/run_batch.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      copying vllm/entrypoints/openai/logits_processors.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai
      creating build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai/rpc
      copying vllm/entrypoints/openai/rpc/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai/rpc
      copying vllm/entrypoints/openai/rpc/client.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai/rpc
      copying vllm/entrypoints/openai/rpc/server.py -> build/lib.linux-x86_64-cpython-311/vllm/entrypoints/openai/rpc
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/dbrx.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/intern_vit.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/siglip.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/deepseek_v2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/commandr.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/chameleon.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/decilm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/internvl.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/gemma2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/olmo.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/minicpmv.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/persimmon.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/internlm2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/idefics2_vision_model.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/phi3v.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/orion.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/jais.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/blip2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/starcoder2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/minicpm3.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/gemma.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/interfaces.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/mlp_speculator.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/llava.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/medusa.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/nemotron.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/minicpm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/jamba.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/llava_next.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/clip.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/fuyu.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/qwen2_moe.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/llama_embedding.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/phi3_small.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/blip.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/paligemma.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/xverse.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/arctic.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      copying vllm/model_executor/models/na_vit.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/models
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/loader.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/tensorizer.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/neuron.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/weight_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      copying vllm/model_executor/model_loader/openvino.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/model_loader
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/guided_decoding
      copying vllm/model_executor/guided_decoding/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/guided_decoding
      copying vllm/model_executor/guided_decoding/outlines_decoding.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/guided_decoding
      copying vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/guided_decoding
      copying vllm/model_executor/guided_decoding/guided_fields.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/guided_decoding
      copying vllm/model_executor/guided_decoding/outlines_logits_processors.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/guided_decoding
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/typical_acceptance_sampler.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/logits_processor.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/spec_decode_base_sampler.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/pooler.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/rotary_embedding.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/rejection_sampler.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe
      copying vllm/model_executor/layers/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe
      copying vllm/model_executor/layers/fused_moe/layer.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe
      copying vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe
      copying vllm/model_executor/layers/fused_moe/fused_moe.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/marlin.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/squeezellm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/deepspeedfp.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/qqq.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/kv_cache.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/gptq_marlin.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/aqlm.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/awq_marlin.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/fp8.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      copying vllm/model_executor/layers/quantization/bitsandbytes.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/ops
      copying vllm/model_executor/layers/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/ops
      copying vllm/model_executor/layers/ops/sample.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/ops
      copying vllm/model_executor/layers/ops/rand.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/ops
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors
      copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors
      copying vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors
      copying vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_qqq.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      copying vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/utils
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_unquantized.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/quantization/compressed_tensors/schemes
      creating build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/naive_block.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/block_table.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/cpu_gpu_block_allocator.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/prefix_caching_block.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/interfaces.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/utils.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      copying vllm/core/block/common.py -> build/lib.linux-x86_64-cpython-311/vllm/core/block
      creating build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/engine/output_processor/util.py -> build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/engine/output_processor/multi_step.py -> build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/engine/output_processor/__init__.py -> build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/engine/output_processor/single_step.py -> build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/engine/output_processor/interfaces.py -> build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/engine/output_processor/stop_checker.py -> build/lib.linux-x86_64-cpython-311/vllm/engine/output_processor
      copying vllm/py.typed -> build/lib.linux-x86_64-cpython-311/vllm
      creating build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-311/vllm/model_executor/layers/fused_moe/configs
      running build_ext
      -- The CXX compiler identification is GNU 9.4.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Build type: RelWithDebInfo
      -- Target device: cuda
      -- Found Python: /home/user/anaconda3/envs/cpm/bin/python3.11 (found version "3.11.0") found components: Interpreter Development.Module Development.SABIModule
      -- Found python matching: /home/user/anaconda3/envs/cpm/bin/python3.11.
      -- Found CUDA: /usr/local/cuda-11.7 (found version "11.7")
      CMake Error at /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/cmake/data/share/cmake-3.30/Modules/CMakeDetermineCompilerId.cmake:838 (message):
        Compiling the CUDA compiler identification source file
        "CMakeCUDACompilerId.cu" failed.

        Compiler: /usr/bin/nvcc

        Build flags:

        Id flags: --keep;--keep-dir;tmp -v

        The output was:

        255

        #$ _SPACE_=

        #$ _CUDART_=cudart

        #$ _HERE_=/usr/lib/nvidia-cuda-toolkit/bin

        #$ _THERE_=/usr/lib/nvidia-cuda-toolkit/bin

        #$ _TARGET_SIZE_=

        #$ _TARGET_DIR_=

        #$ _TARGET_SIZE_=64

        #$ NVVMIR_LIBRARY_DIR=/usr/lib/nvidia-cuda-toolkit/libdevice

        #$
        PATH=/usr/lib/nvidia-cuda-toolkit/bin:/tmp/pip-build-env-_q6a864r/overlay/bin:/tmp/pip-build-env-_q6a864r/normal/bin:/home/user/.vscode-server/cli/servers/Stable-4849ca9bdf9666755eb463db297b69e5385090e3/server/bin/remote-cli:/usr/local/cuda-11.7/bin:/home/user/anaconda3/envs/cpm/bin:/home/user/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/user/anaconda3/envs/pytorch38/bin:/usr/local/cuda-11.7/bin:/home/user/.vscode-server/cli/servers/Stable-4849ca9bdf9666755eb463db297b69e5385090e3/server/bin/remote-cli:/usr/local/cuda-11.7/bin:/home/user/anaconda3/bin:/home/user/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/user/anaconda3/envs/pytorch38/bin:/home/user/.vscode-server/cli/servers/Stable-4849ca9bdf9666755eb463db297b69e5385090e3/server/bin/remote-cli:/usr/local/cuda-11.7/bin:/home/user/anaconda3/bin:/home/user/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/user/anaconda3/envs/pytorch38/bin:/home/user/anaconda3/envs/pytorch38/bin

        #$ LIBRARIES= -L/usr/lib/x86_64-linux-gnu/stubs -L/usr/lib/x86_64-linux-gnu

        #$ rm tmp/a_dlink.reg.c

        #$ gcc -D__CUDA_ARCH__=300 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS
        -D__CUDACC__ -D__NVCC__ -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=1
        -D__CUDACC_VER_BUILD__=243 -include "cuda_runtime.h" -m64
        "CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii"

        #$ cicc --c++14 --gnu_version=90400 --allow_managed -arch compute_30 -m64
        -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name
        "CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library
        "/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc"
        --gen_module_id_file --module_id_file_name
        "tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name
        "CMakeCUDACompilerId.cu" --gen_c_file_name
        "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name
        "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name
        "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o
        "tmp/CMakeCUDACompilerId.ptx"

        #$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o
        "tmp/CMakeCUDACompilerId.sm_30.cubin"

        ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name'

        # --error 0xff --

      Call Stack (most recent call first):
        /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/cmake/data/share/cmake-3.30/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
        /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/cmake/data/share/cmake-3.30/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
        /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/cmake/data/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake:131 (CMAKE_DETERMINE_COMPILER_ID)
        /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:47 (enable_language)
        /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
        /tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
        CMakeLists.txt:67 (find_package)

      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/home/user/anaconda3/envs/cpm/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/user/anaconda3/envs/cpm/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/user/anaconda3/envs/cpm/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 421, in build_wheel
          return self._build_with_temp_dir(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 403, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 318, in run_setup
          exec(code, locals())
        File "<string>", line 456, in <module>
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/command/bdist_wheel.py", line 384, in run
          self.run_command("build")
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 950, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 98, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-_q6a864r/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "<string>", line 219, in build_extensions
        File "<string>", line 201, in configure
        File "/home/user/anaconda3/envs/cpm/lib/python3.11/subprocess.py", line 413, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-req-build-jvw7lpre', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/pip-req-build-jvw7lpre/build/lib.linux-x86_64-cpython-311/vllm', '-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY=build/temp.linux-x86_64-cpython-311', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/home/user/anaconda3/envs/cpm/bin/python3.11', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=24']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for vllm
Failed to build vllm
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (vllm)