OpenBMB / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
11 stars 5 forks source link

[Installation]: 安装报告找不到numpy,实际nump已经安装好了 #14

Open qq745639151 opened 2 months ago

qq745639151 commented 2 months ago

Your current environment

PyTorch version: 2.4.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-57-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 Nvidia driver version: 550.78 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 40 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 112 On-line CPU(s) list: 0-111 Vendor ID: GenuineIntel Model name: Intel Xeon Processor (Icelake) CPU family: 6 Model: 134 Thread(s) per core: 1 Core(s) per socket: 112 Socket(s): 1 Stepping: 0 BogoMIPS: 5600.02 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke avx512_vnni arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 3.5 MiB (112 instances) L1i cache: 3.5 MiB (112 instances) L2 cache: 448 MiB (112 instances) L3 cache: 16 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-111 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] numpy==2.1.1 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] sentence-transformers==3.1.0 [pip3] torch==2.4.1 [pip3] torchvision==0.19.1 [pip3] transformers==4.44.2 [pip3] triton==3.0.0 [conda] numpy 2.1.1 pypi_0 pypi [conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi [conda] sentence-transformers 3.1.0 pypi_0 pypi [conda] torch 2.4.1 pypi_0 pypi [conda] torchvision 0.19.1 pypi_0 pypi [conda] transformers 4.44.2 pypi_0 pypi [conda] triton 3.0.0 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: N/A vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X 0-111 0 N/A

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

How you are installing vllm

pip install git+https://github.com/OpenBMB/vllm.git@minicpm3

Before submitting a new issue...

qq745639151 commented 2 months ago

Building wheels for collected packages: vllm Building wheel for vllm (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for vllm (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [609 lines of output] /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device("cpu")) running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-cpython-310 creating build/lib.linux-x86_64-cpython-310/vllm copying vllm/init.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/connections.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/scalar_type.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/_core_ext.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/block.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/config.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/envs.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/logger.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/utils.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/version.py -> build/lib.linux-x86_64-cpython-310/vllm copying vllm/commit_id.py -> build/lib.linux-x86_64-cpython-310/vllm creating build/lib.linux-x86_64-cpython-310/vllm/adapter_commons copying vllm/adapter_commons/init.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons copying vllm/adapter_commons/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons copying vllm/adapter_commons/models.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons copying vllm/adapter_commons/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons copying vllm/adapter_commons/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons copying vllm/adapter_commons/request.py -> build/lib.linux-x86_64-cpython-310/vllm/adapter_commons creating build/lib.linux-x86_64-cpython-310/vllm/assets copying vllm/assets/init.py -> build/lib.linux-x86_64-cpython-310/vllm/assets copying vllm/assets/base.py -> build/lib.linux-x86_64-cpython-310/vllm/assets copying vllm/assets/image.py -> build/lib.linux-x86_64-cpython-310/vllm/assets creating build/lib.linux-x86_64-cpython-310/vllm/attention copying vllm/attention/init.py -> build/lib.linux-x86_64-cpython-310/vllm/attention copying vllm/attention/layer.py -> build/lib.linux-x86_64-cpython-310/vllm/attention copying vllm/attention/selector.py -> build/lib.linux-x86_64-cpython-310/vllm/attention creating build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/init.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/evictor_v1.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/block_manager_v1.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/block_manager_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/embedding_model_block_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/evictor_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/core copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-310/vllm/core creating build/lib.linux-x86_64-cpython-310/vllm/distributed copying vllm/distributed/init.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed copying vllm/distributed/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed copying vllm/distributed/communication_op.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed copying vllm/distributed/parallel_state.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed creating build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/init.py -> build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/async_timeout.py -> build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/engine copying vllm/engine/protocol.py -> build/lib.linux-x86_64-cpython-310/vllm/engine creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints copying vllm/entrypoints/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints copying vllm/entrypoints/logger.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints copying vllm/entrypoints/chat_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints copying vllm/entrypoints/launcher.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints creating build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/init.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/multiproc_worker_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/cpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/distributed_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/executor_base.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/multiproc_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/neuron_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/openvino_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/ray_gpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/ray_tpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/ray_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/ray_xpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/tpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor copying vllm/executor/xpu_executor.py -> build/lib.linux-x86_64-cpython-310/vllm/executor creating build/lib.linux-x86_64-cpython-310/vllm/inputs copying vllm/inputs/init.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs copying vllm/inputs/data.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs copying vllm/inputs/registry.py -> build/lib.linux-x86_64-cpython-310/vllm/inputs creating build/lib.linux-x86_64-cpython-310/vllm/logging copying vllm/logging/init.py -> build/lib.linux-x86_64-cpython-310/vllm/logging copying vllm/logging/formatter.py -> build/lib.linux-x86_64-cpython-310/vllm/logging creating build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/init.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/fully_sharded_layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/punica.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-310/vllm/lora copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-310/vllm/lora creating build/lib.linux-x86_64-cpython-310/vllm/model_executor copying vllm/model_executor/pooling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor copying vllm/model_executor/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor copying vllm/model_executor/custom_op.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor creating build/lib.linux-x86_64-cpython-310/vllm/multimodal copying vllm/multimodal/init.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal copying vllm/multimodal/base.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal copying vllm/multimodal/image.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal copying vllm/multimodal/registry.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal copying vllm/multimodal/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/multimodal creating build/lib.linux-x86_64-cpython-310/vllm/platforms copying vllm/platforms/tpu.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms copying vllm/platforms/init.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms copying vllm/platforms/cuda.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms copying vllm/platforms/interface.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms copying vllm/platforms/rocm.py -> build/lib.linux-x86_64-cpython-310/vllm/platforms creating build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter copying vllm/prompt_adapter/init.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter copying vllm/prompt_adapter/layers.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter copying vllm/prompt_adapter/models.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter copying vllm/prompt_adapter/worker_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter copying vllm/prompt_adapter/request.py -> build/lib.linux-x86_64-cpython-310/vllm/prompt_adapter creating build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/init.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/batch_expansion.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/draft_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/medusa_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/metrics.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/mlp_speculator_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/multi_step_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/ngram_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/proposer_worker_base.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/smaller_tp_proposer_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/spec_decode_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/target_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/top1_proposer.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode copying vllm/spec_decode/util.py -> build/lib.linux-x86_64-cpython-310/vllm/spec_decode creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils copying vllm/transformers_utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils copying vllm/transformers_utils/image_processor.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils copying vllm/transformers_utils/detokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils creating build/lib.linux-x86_64-cpython-310/vllm/triton_utils copying vllm/triton_utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils copying vllm/triton_utils/custom_cache_manager.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils copying vllm/triton_utils/libentry.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils copying vllm/triton_utils/sample.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils copying vllm/triton_utils/importing.py -> build/lib.linux-x86_64-cpython-310/vllm/triton_utils creating build/lib.linux-x86_64-cpython-310/vllm/usage copying vllm/usage/init.py -> build/lib.linux-x86_64-cpython-310/vllm/usage copying vllm/usage/usage_lib.py -> build/lib.linux-x86_64-cpython-310/vllm/usage creating build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/init.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/cpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/cpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/embedding_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/model_runner_base.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/neuron_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/neuron_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/openvino_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/openvino_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/tpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/tpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/worker_base.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/xpu_model_runner.py -> build/lib.linux-x86_64-cpython-310/vllm/worker copying vllm/worker/xpu_worker.py -> build/lib.linux-x86_64-cpython-310/vllm/worker creating build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/init.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/abstract.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/blocksparse_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/flashinfer.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/ipex_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/openvino.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/pallas.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/rocm_flash_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/torch_sdpa.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends copying vllm/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/backends creating build/lib.linux-x86_64-cpython-310/vllm/attention/ops copying vllm/attention/ops/init.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops copying vllm/attention/ops/triton_flash_attention.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops copying vllm/attention/ops/ipex_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops copying vllm/attention/ops/paged_attn.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops copying vllm/attention/ops/prefix_prefill.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops creating build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention copying vllm/attention/ops/blocksparse_attention/init.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention copying vllm/attention/ops/blocksparse_attention/blocksparse_attention_kernel.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention copying vllm/attention/ops/blocksparse_attention/interface.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention copying vllm/attention/ops/blocksparse_attention/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/attention/ops/blocksparse_attention creating build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/init.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/block_table.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/common.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/cpu_gpu_block_allocator.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/naive_block.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/prefix_caching_block.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block copying vllm/core/block/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/core/block creating build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/init.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/pynccl.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/pynccl_wrapper.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/shm_broadcast.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/cuda_wrapper.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/custom_all_reduce_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators copying vllm/distributed/device_communicators/tpu_communicator.py -> build/lib.linux-x86_64-cpython-310/vllm/distributed/device_communicators creating build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor copying vllm/engine/output_processor/init.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor copying vllm/engine/output_processor/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor copying vllm/engine/output_processor/multi_step.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor copying vllm/engine/output_processor/single_step.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor copying vllm/engine/output_processor/stop_checker.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor copying vllm/engine/output_processor/util.py -> build/lib.linux-x86_64-cpython-310/vllm/engine/output_processor creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/cli_args.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/logits_processors.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/run_batch.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/serving_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai copying vllm/entrypoints/openai/serving_tokenization.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai creating build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai/rpc copying vllm/entrypoints/openai/rpc/init.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai/rpc copying vllm/entrypoints/openai/rpc/client.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai/rpc copying vllm/entrypoints/openai/rpc/server.py -> build/lib.linux-x86_64-cpython-310/vllm/entrypoints/openai/rpc creating build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/init.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/bgmv_expand.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/bgmv_expand_slice.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/bgmv_shrink.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/sgmv_expand.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/sgmv_expand_slice.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops copying vllm/lora/ops/sgmv_shrink.py -> build/lib.linux-x86_64-cpython-310/vllm/lora/ops creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding copying vllm/model_executor/guided_decoding/guided_fields.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding copying vllm/model_executor/guided_decoding/outlines_logits_processors.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding copying vllm/model_executor/guided_decoding/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding copying vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding copying vllm/model_executor/guided_decoding/outlines_decoding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/guided_decoding creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/pooler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/logits_processor.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/rejection_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/rotary_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/spec_decode_base_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/typical_acceptance_sampler.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/tensorizer.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/loader.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/neuron.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/openvino.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader copying vllm/model_executor/model_loader/weight_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/model_loader creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/decilm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/idefics2_vision_model.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/llama_embedding.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/na_vit.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/xverse.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/arctic.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/blip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/blip2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/chameleon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/clip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/commandr.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/dbrx.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/deepseek_v2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/fuyu.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/gemma.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/gemma2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/interfaces.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/intern_vit.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/internlm2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/internvl.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/jais.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/jamba.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/llava.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/llava_next.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/medusa.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/minicpm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/minicpm3.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/minicpmv.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/mlp_speculator.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/nemotron.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/olmo.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/orion.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/paligemma.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/persimmon.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/phi3_small.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/phi3v.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/qwen2_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/siglip.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/starcoder2.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models copying vllm/model_executor/models/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/models creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe copying vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe copying vllm/model_executor/layers/fused_moe/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe copying vllm/model_executor/layers/fused_moe/fused_moe.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe copying vllm/model_executor/layers/fused_moe/layer.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops copying vllm/model_executor/layers/ops/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops copying vllm/model_executor/layers/ops/rand.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops copying vllm/model_executor/layers/ops/sample.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/ops creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/deepspeedfp.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/kv_cache.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/squeezellm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/aqlm.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/awq_marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/bitsandbytes.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/gptq_marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/marlin.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization copying vllm/model_executor/layers/quantization/qqq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors copying vllm/model_executor/layers/quantization/compressed_tensors/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors copying vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_qqq.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils copying vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/utils creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/init.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_unquantized.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/quantization/compressed_tensors/schemes creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/arctic.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/dbrx.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/internvl.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/jais.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/medusa.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/mlp_speculator.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/mpt.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs copying vllm/transformers_utils/configs/nemotron.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/configs creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group copying vllm/transformers_utils/tokenizer_group/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group copying vllm/transformers_utils/tokenizer_group/base_tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group copying vllm/transformers_utils/tokenizer_group/ray_tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group copying vllm/transformers_utils/tokenizer_group/tokenizer_group.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizer_group creating build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers copying vllm/transformers_utils/tokenizers/baichuan.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers copying vllm/transformers_utils/tokenizers/init.py -> build/lib.linux-x86_64-cpython-310/vllm/transformers_utils/tokenizers copying vllm/py.typed -> build/lib.linux-x86_64-cpython-310/vllm creating build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=float8.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-310/vllm/model_executor/layers/fused_moe/configs running build_ext -- The CXX compiler identification is GNU 11.4.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Build type: RelWithDebInfo -- Target device: cuda -- Found Python: /root/miniconda3/envs/minicpm3/bin/python3 (found version "3.10.14") found components: Interpreter Development.Module Development.SABIModule -- Found python matching: /root/miniconda3/envs/minicpm3/bin/python3. -- Found CUDA: /usr/local/cuda (found version "12.1") -- The CUDA compiler identification is NVIDIA 12.1.105 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.1.105") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Caffe2: CUDA detected: 12.1 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda -- Caffe2: Header version is: 12.1 -- /usr/local/cuda/lib64/libnvrtc.so shorthash is b51b459d -- USE_CUDNN is set to 0. Compiling without cuDNN support -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support -- Autodetected CUDA architecture(s): 8.6 -- Added CUDA NVCC flags for: -gencode;arch=compute_86,code=sm_86 CMake Warning at /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not found. Call Stack (most recent call first): /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found) CMakeLists.txt:67 (find_package)

  -- Found Torch: /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/lib/libtorch.so
  -- Enabling core extension.
  -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
  -- CUDA target arches: 86-real
  -- CMake Version: 3.30.3
  -- CUTLASS 3.5.1
  -- CUDART: /usr/local/cuda/lib64/libcudart.so
  -- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so
  -- NVRTC: /usr/local/cuda/lib64/libnvrtc.so
  -- Default Install Location: install
  -- Found Python3: /root/miniconda3/envs/minicpm3/bin/python3.10 (found suitable version "3.10.14", minimum required is "3.5") found components: Interpreter
  -- Make cute::tuple be the new standard-layout tuple type
  -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
  -- Enable caching of reference results in conv unit tests
  -- Enable rigorous conv problem sizes in conv unit tests
  -- Using NVCC flags: --expt-relaxed-constexpr;-DCUTE_USE_PACKED_TUPLE=1;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0;-Xcompiler=-Wconversion;-Xcompiler=-fno-strict-aliasing;-lineinfo
  -- CUTLASS Revision: 7de3e8dc
  -- Configuring cublas ...
  -- cuBLAS Disabled.
  -- Configuring cuBLAS ... done.
  -- Enabling C extension.
  -- Enabling moe extension.
  -- Configuring done (12.6s)
  -- Generating done (0.1s)
  -- Build files have been written to: /tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310
  [1/32] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o
  [2/32] Building CXX object CMakeFiles/_core_C.dir/csrc/core/torch_bindings.cpp.o
  [3/32] Linking CXX shared module /tmp/pip-req-build-v6iyfnsr/build/lib.linux-x86_64-cpython-310/vllm/_core_C.abi3.so
  [4/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu -o CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o
  Killed
  [5/32] Building CUDA object CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o -MF CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/attention/attention_kernels.cu -o CMakeFiles/_C.dir/csrc/attention/attention_kernels.cu.o
  Killed
  [6/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/quantization/gptq_marlin/gptq_marlin.cu -o CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o
  Killed
  [7/32] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o
  [8/32] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o
  [9/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/quantization/fp8/common.cu -o CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o
  Killed
  [10/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/quantization/gptq/q_gemm.cu -o CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o
  Killed
  [11/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/quantization/fp8/fp8_marlin.cu -o CMakeFiles/_C.dir/csrc/quantization/fp8/fp8_marlin.cu.o
  Killed
  [12/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
  FAILED: CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/tmp/pip-req-build-v6iyfnsr/csrc -I/tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include -isystem /root/miniconda3/envs/minicpm3/include/python3.10 -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include -isystem /tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o -MF CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o.d -x cu -c /tmp/pip-req-build-v6iyfnsr/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu -o CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o
  Killed
  [13/32] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o
  [14/32] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o
  [15/32] Building CUDA object CMakeFiles/_C.dir/csrc/prepare_inputs/advance_step.cu.o
  [16/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/squeezellm/quant_cuda_kernel.cu.o
  [17/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o
  [18/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o
  [19/32] Building CUDA object CMakeFiles/_C.dir/csrc/moe_align_block_size_kernels.cu.o
  [20/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o
  [21/32] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o
  [22/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o
  [23/32] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o
  [24/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/aqlm/gemm_kernels.cu.o
  [25/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o
  [26/32] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
  [27/32] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o
  [28/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/qqq/marlin_qqq_gemm_kernel.cu.o
  [29/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/dense/marlin_cuda_kernel.cu.o
  [30/32] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x.cu.o
  /tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include/cutlass/device_kernel.h: In function ‘void cutlass::device_kernel(typename Operator::Params) [with Operator = _GLOBAL__N__fa99bd80_16_scaled_mm_c3x_cu_22d95651::cutlass_3x_gemm<signed char, cutlass::bfloat16_t, _GLOBAL__N__fa99bd80_16_scaled_mm_c3x_cu_22d95651::ScaledEpilogueBias, cute::tuple<cute::C<64>, cute::C<64>, cute::C<256> >, cute::tuple<cute::C<1>, cute::C<8>, cute::C<1> >, cutlass::gemm::KernelTmaWarpSpecialized, cutlass::epilogue::TmaWarpSpecialized>::GemmKernel]’:
  /tmp/pip-req-build-v6iyfnsr/build/temp.linux-x86_64-cpython-310/_deps/cutlass-src/include/cutlass/device_kernel.h:104:1: note: the ABI for passing parameters with 64-byte alignment has changed in GCC 4.6
    104 | void device_kernel(CUTLASS_GRID_CONSTANT typename Operator::Params const params)
        | ^~~~~~~~~~~~~
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/root/miniconda3/envs/minicpm3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/root/miniconda3/envs/minicpm3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/root/miniconda3/envs/minicpm3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 421, in build_wheel
      return self._build_with_temp_dir(
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 403, in _build_with_temp_dir
      self.run_setup()
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 318, in run_setup
      exec(code, locals())
    File "<string>", line 456, in <module>
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 117, in setup
      return distutils.core.setup(**attrs)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
      self.run_command(cmd)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/command/bdist_wheel.py", line 384, in run
      self.run_command("build")
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 950, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 98, in run
      _build_ext.run(self)
    File "/tmp/pip-build-env-k16ao4rz/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
      self.build_extensions()
    File "<string>", line 231, in build_extensions
    File "/root/miniconda3/envs/minicpm3/lib/python3.10/subprocess.py", line 369, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=112', '--target=_core_C', '--target=_moe_C', '--target=_C']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for vllm Failed to build vllm ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (vllm)