Closed ivanbaldo closed 6 months ago
To me, this looks like a candle flash attention compilation error. However, it may be because I compile CUDA kernels, too. Could you try compilation without flash-attn
and let me know if that breaks?
Thanks Eric!
Restarting from scratch but with RUN cargo build --release --features cuda,cudnn,nccl
fails exactly the same, it seems to bring that dependency anyway.
cargo update
shows:
Updating candle-flash-attn v0.3.2 (https://github.com/huggingface/candle.git#94817dac) -> #9e824ec8
Changes can be seen in https://github.com/huggingface/candle/compare/94817dac..9e824ec8 .
I will give it a try tomorrow (slow laptop here...).
Thanks for your help!!!
Hello!
cargo update
didn't work, this time with this error:
error[E0277]: expected a `Fn<(&candle_core::Tensor,)>` closure, found `BatchNorm`
--> /root/.cargo/git/checkouts/candle-lora-e71fb47097131b72/8b516d4/candle-lora-transformers/src/resnet.rs:87:61
|
87 | Ok(UnsyncFunc::new(move |xs| xs.apply(&conv)?.apply(&bn)))
| ----- ^^^ expected an `Fn<(&candle_core::Tensor,)>` closure, found `BatchNorm`
| |
| required by a bound introduced by this call
|
= help: the trait `for<'a> Fn<(&'a candle_core::Tensor,)>` is not implemented for `BatchNorm`
= help: the following other types implement trait `Module`:
AttentionBlock
ClipTextTransformer
Conv1d
ConvTranspose1d
ConvTranspose2d
DownEncoderBlock2D
EfficientNet
Func<'a>
and 52 others
= note: required for `BatchNorm` to implement `Module`
note: required by a bound in `candle_core::Tensor::apply`
--> /root/.cargo/git/checkouts/candle-0c2b4fa9e5801351/9e824ec/candle-core/src/tensor.rs:2337:21
|
2337 | pub fn apply<M: crate::Module>(&self, m: &M) -> Result<Self> {
| ^^^^^^^^^^^^^ required by this bound in `Tensor::apply`
So it seems some APIs changed incompatibly and so users need to be updated for them. Thanks.
Thank you for notifying me, my CI infra does not run periodically and did not catch this. Please see huggingface/candle#1647, and feel free to add anything.
Now I tried running cargo update -p candle-flash-attn
but it fails in the same way as cargo update
.
Thanks for reporting the issue to candle, will monitor it!
I guess then that we need to wait for a fix for that first and then see if with the updates everything works correctly.
Yes, the main problem is that I cannot find the trait bound causing the issue.
candle-lora is used once, below, so I will look into removing it as a dependency. https://github.com/EricLBuehler/candle-vllm/blob/9b0d89f1354cd52495162c65293fba10eff717c9/src/openai/pipelines/llama.rs#L25
Meanwhile I am trying with this changes just in case:
diff --git a/Cargo.toml b/Cargo.toml
index f159df4..95b9e90 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -8,17 +8,17 @@ edition = "2021"
[dependencies]
actix-web = "4.4.0"
anyhow = "1.0.75"
-candle-core = { git = "https://github.com/huggingface/candle.git", version = "0.3.0" }
-candle-examples = { git = "https://github.com/huggingface/candle.git", version = "0.3.0" }
+candle-core = "0.3.3"
+candle-examples = "0.3.2"
candle-lora = { git = "https://github.com/EricLBuehler/candle-lora.git", version = "0.2.0" }
candle-lora-macro = { git = "https://github.com/EricLBuehler/candle-lora.git", version = "0.2.0" }
candle-lora-transformers = { git = "https://github.com/EricLBuehler/candle-lora.git", version = "0.2.0" }
-candle-nn = { git = "https://github.com/huggingface/candle.git", version = "0.3.0" }
+candle-nn = "0.3.3"
dyn-fmt = "0.4.0"
serde = { version = "1.0.190", features = ["serde_derive"] }
tokenizers = "0.15.0"
uuid = { version = "1.5.0", features = ["v4"] }
-candle-transformers = { git = "https://github.com/huggingface/candle.git", version = "0.3.0" }
+candle-transformers = "0.3.3"
hf-hub = "0.3.2"
serde_json = "1.0.108"
derive_more = "0.99.17"
@@ -26,7 +26,7 @@ accelerate-src = { version = "0.3.2", optional = true }
intel-mkl-src = { version = "0.8.1", features = ["mkl-static-lp64-iomp"], optional = true }
cudarc = { version = "0.9.14", features = ["f16"], optional = true }
half = { version = "2.3.1", features = ["num-traits", "use-intrinsics", "rand_distr"] }
-candle-flash-attn = { git = "https://github.com/huggingface/candle.git", version = "0.3.0", optional = true }
+candle-flash-attn = { version = "0.3.3", optional = true }
clap = { version = "4.4.7", features = ["derive"] }
candle-sampling = { git = "https://github.com/EricLBuehler/candle-sampling.git", version = "0.2.0" }
futures = "0.3.29"
I fixed that bug - it was on the candle-lora side. Could you try again with the original Cargo.toml and after cargo update
?
Looks like there is a dependency bug - I just pushed a fix.
Of course, will try again and let you know, this will take about 50 minutes or so. Thanks so much!!!
Ok, sounds good!
I rebuilt the container from scratch (new git clone etc.) but unfortunately it failed again with the same linking error:
error: linking with `cc` failed: exit status: 1
|
= note: LC_ALL="C" PATH="/usr/lib/rustlib/x86_64-unknown-linux-gnu/bin:/root/.local/bin:/root/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" VSLANG="1033" "cc" "-m64" "/tmp/rustcN3D4xH/symbols.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.0.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.1.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.10.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.11.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.12.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.13.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.14.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.15.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.2.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.3.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.4.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.5.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.6.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.7.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.8.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.candle_vllm.cb7142adda2cd887-cgu.9.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb.4psql9m0o7iw6sqs.rcgu.o" "-Wl,--as-needed" "-L" "/candle-vllm/target/release/deps" "-L" "/candle-vllm/target/release/build/zstd-sys-51991617680764ab/out" "-L" "/usr/local/cuda/lib64" "-L" "/usr/local/cuda/lib64/stubs" "-L" "/usr/local/cuda/targets/x86_64-linux" "-L" "/usr/local/cuda/targets/x86_64-linux/lib" "-L" "/usr/local/cuda/targets/x86_64-linux/lib/stubs" "-L" "/usr/lib" "-L" "/usr/lib64" "-L" "/candle-vllm/target/release/build/bzip2-sys-f7fb57a3f4e98cc1/out/lib" "-L" "/candle-vllm/target/release/build/ring-a59330cc6e943984/out" "-L" "/candle-vllm/target/release/build/lz4-sys-c90b3b6e3d6da391/out" "-L" "/candle-vllm/target/release/build/esaxx-rs-83f1f68488f360a8/out" "-L" "/candle-vllm/target/release/build/onig_sys-d0c2f3461f43020d/out" "-L" "/candle-vllm/target/release/build/candle-flash-attn-bb54a4d16d25ee03/out" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/candle-vllm/target/release/deps/libenv_logger-0f0fa188a1404846.rlib" "/candle-vllm/target/release/deps/libtermcolor-c53cf66b9b32e10f.rlib" "/candle-vllm/target/release/deps/libis_terminal-cdf9c5266fcbba03.rlib" "/candle-vllm/target/release/deps/librustix-a629012946c99e6d.rlib" "/candle-vllm/target/release/deps/liblinux_raw_sys-15bed2ca91cf42a8.rlib" "/candle-vllm/target/release/deps/libhumantime-1dc284c82c7f0559.rlib" "/candle-vllm/target/release/deps/libcandle_vllm-ce9b07d51787770c.rlib" "/candle-vllm/target/release/deps/libchrono-ae2c4cf3aacef826.rlib" "/candle-vllm/target/release/deps/libiana_time_zone-2bd86fbdc9e46a38.rlib" "/candle-vllm/target/release/deps/libhf_hub-b2415c762b503a90.rlib" "/candle-vllm/target/release/deps/libdirs-45aa89c180ae36f2.rlib" "/candle-vllm/target/release/deps/libdirs_sys-b0294348c2e4986c.rlib" "/candle-vllm/target/release/deps/liboption_ext-3db96de540040126.rlib" "/candle-vllm/target/release/deps/libureq-22a62ebb34562523.rlib" "/candle-vllm/target/release/deps/libnative_tls-addec962e00a97ff.rlib" "/candle-vllm/target/release/deps/libopenssl_probe-e135bf478bd9e62b.rlib" "/candle-vllm/target/release/deps/libopenssl-f7e740960c8b0b56.rlib" "/candle-vllm/target/release/deps/libforeign_types-434e4620cdd2963d.rlib" "/candle-vllm/target/release/deps/libforeign_types_shared-3cd91dddd8b3059a.rlib" "/candle-vllm/target/release/deps/libopenssl_sys-2724f2f05b6f6e71.rlib" "/candle-vllm/target/release/deps/libwebpki_roots-fb31dcc12f4e6db5.rlib" "/candle-vllm/target/release/deps/librustls-ca4a80b00d74d11d.rlib" "/candle-vllm/target/release/deps/libsct-d1a0a53864376724.rlib" "/candle-vllm/target/release/deps/libwebpki-8db93ee63982280a.rlib" "/candle-vllm/target/release/deps/libring-c45b21a3fb043429.rlib" "/candle-vllm/target/release/deps/libspin-a5bca8ced7fc453c.rlib" "/candle-vllm/target/release/deps/libuntrusted-766afbb3ef44c1d1.rlib" "/candle-vllm/target/release/deps/libcandle_lora_transformers-c49058ffb6d7068a.rlib" "/candle-vllm/target/release/deps/libtqdm-e47c7a840c2fc706.rlib" "/candle-vllm/target/release/deps/libcrossterm-f705860770d94db8.rlib" "/candle-vllm/target/release/deps/libsignal_hook_mio-ec3a5a299cc915e5.rlib" "/candle-vllm/target/release/deps/libsignal_hook-49df15a2181bf250.rlib" "/candle-vllm/target/release/deps/libanyhow-78648c12fa2eaee5.rlib" "/candle-vllm/target/release/deps/libcandle_lora-0543f2db3a02f6c2.rlib" "/candle-vllm/target/release/deps/libtrc-af4d2dc9e955d45c.rlib" "/candle-vllm/target/release/deps/libuuid-8e9abe15319c7747.rlib" "/candle-vllm/target/release/deps/libcandle_transformers-3a408f703fe757e5.rlib" "/candle-vllm/target/release/deps/libserde_plain-9edacf8e6b8b5e3b.rlib" "/candle-vllm/target/release/deps/libcandle_flash_attn-6ec38f8ed9aac30d.rlib" "/candle-vllm/target/release/deps/libdyn_fmt-ca01837b2f65b0b1.rlib" "/candle-vllm/target/release/deps/libfutures-813f484dc1c71e4c.rlib" "/candle-vllm/target/release/deps/libfutures_executor-cdd38bae408d4ce8.rlib" "/candle-vllm/target/release/deps/libcandle_sampling-07b86ed24f500345.rlib" "/candle-vllm/target/release/deps/libcandle_nn-3eaedbdadbe5fbb5.rlib" "/candle-vllm/target/release/deps/libtokenizers-61b7f12c56fed2c5.rlib" "/candle-vllm/target/release/deps/libesaxx_rs-c3b0fa8f52cc413c.rlib" "/candle-vllm/target/release/deps/libunicode_normalization_alignments-025da513407d9879.rlib" "/candle-vllm/target/release/deps/libspm_precompiled-8a5e3784a84b6fa0.rlib" "/candle-vllm/target/release/deps/libbase64-a00060132962802d.rlib" "/candle-vllm/target/release/deps/libunicode_segmentation-0609f6ce0b27032d.rlib" "/candle-vllm/target/release/deps/libnom-828591b7d6e9f08d.rlib" "/candle-vllm/target/release/deps/libunicode_categories-4b2d8309eb580595.rlib" "/candle-vllm/target/release/deps/libmonostate-121edb8fb43689e8.rlib" "/candle-vllm/target/release/deps/libmacro_rules_attribute-fbe2172e90fd6d9d.rlib" "/candle-vllm/target/release/deps/libindicatif-5ac26ff2181c3839.rlib" "/candle-vllm/target/release/deps/libportable_atomic-37fa7d733d3c2283.rlib" "/candle-vllm/target/release/deps/libnumber_prefix-fcbd61cd7f0fb674.rlib" "/candle-vllm/target/release/deps/libconsole-927989bf813852d8.rlib" "/candle-vllm/target/release/deps/libunicode_width-4a01194dbfae8c91.rlib" "/candle-vllm/target/release/deps/librayon_cond-ec5fdcb09b40065c.rlib" "/candle-vllm/target/release/deps/libitertools-87b264833edf6f52.rlib" "/candle-vllm/target/release/deps/libonig-40dabd6ed5124b91.rlib" "/candle-vllm/target/release/deps/libonig_sys-90597c1391bce008.rlib" "/candle-vllm/target/release/deps/libderive_builder-3471ddeab47c0b9a.rlib" "/candle-vllm/target/release/deps/liblazy_static-852800890c81fb22.rlib" "/candle-vllm/target/release/deps/libclap-23394ec333e54596.rlib" "/candle-vllm/target/release/deps/libclap_builder-41cde94296fdb820.rlib" "/candle-vllm/target/release/deps/libstrsim-bfb3799e9677cd4d.rlib" "/candle-vllm/target/release/deps/libanstream-d284661ab137b824.rlib" "/candle-vllm/target/release/deps/libanstyle_query-d08e7c102e46eb49.rlib" "/candle-vllm/target/release/deps/libcolorchoice-d9fe16d50a3dd803.rlib" "/candle-vllm/target/release/deps/libanstyle_parse-6ac7d6e179081361.rlib" "/candle-vllm/target/release/deps/libutf8parse-86e737e0d4678582.rlib" "/candle-vllm/target/release/deps/libclap_lex-3a6b7689365ae37a.rlib" "/candle-vllm/target/release/deps/libanstyle-9a261b265642b8a4.rlib" "/candle-vllm/target/release/deps/libcandle_core-d2f01b6e6a29d888.rlib" "/candle-vllm/target/release/deps/libmemmap2-4476da1f91fb3603.rlib" "/candle-vllm/target/release/deps/libzip-9bf92410c307c36c.rlib" "/candle-vllm/target/release/deps/libpbkdf2-bfe2a8675cfe3dd6.rlib" "/candle-vllm/target/release/deps/libsha2-7f594f901cd89567.rlib" "/candle-vllm/target/release/deps/libpassword_hash-2fa33ff8d4990779.rlib" "/candle-vllm/target/release/deps/libbase64ct-760f27bcfd4054ae.rlib" "/candle-vllm/target/release/deps/libzstd-bafef58bb20c82a7.rlib" "/candle-vllm/target/release/deps/libzstd_safe-2c41e8f78c52fdfc.rlib" "/candle-vllm/target/release/deps/libbzip2-b94c5c5e7c15f010.rlib" "/candle-vllm/target/release/deps/libbzip2_sys-a158ea0d0289b351.rlib" "/candle-vllm/target/release/deps/libaes-dc1bc8251226040a.rlib" "/candle-vllm/target/release/deps/libcipher-eeb8ea70098f4f7f.rlib" "/candle-vllm/target/release/deps/libinout-5e79d2c693701e41.rlib" "/candle-vllm/target/release/deps/libhmac-246f344022381f5d.rlib" "/candle-vllm/target/release/deps/libconstant_time_eq-742a8ca43fc4b3c6.rlib" "/candle-vllm/target/release/deps/libyoke-b5cb326284cb506c.rlib" "/candle-vllm/target/release/deps/libzerofrom-72df68927b68a064.rlib" "/candle-vllm/target/release/deps/libstable_deref_trait-76725faa25d9c59b.rlib" "/candle-vllm/target/release/deps/libthiserror-7cc4f2a96da73a94.rlib" "/candle-vllm/target/release/deps/libsafetensors-b94965e86f7ef122.rlib" "/candle-vllm/target/release/deps/libcudarc-bb4cc1d0d1d68ba3.rlib" "/candle-vllm/target/release/deps/libcandle_kernels-af06d5fd4a087af6.rlib" "/candle-vllm/target/release/deps/libgemm-9939fb772d1ff792.rlib" "/candle-vllm/target/release/deps/libgemm_c32-cba446e570d4386d.rlib" "/candle-vllm/target/release/deps/libgemm_c64-701b72db790c5491.rlib" "/candle-vllm/target/release/deps/libgemm_f64-132035f8fb79f58d.rlib" "/candle-vllm/target/release/deps/libgemm_f16-a17195123a2b5a97.rlib" "/candle-vllm/target/release/deps/libgemm_f32-43dd1a29089d0d80.rlib" "/candle-vllm/target/release/deps/libgemm_common-888ab4912d03277a.rlib" "/candle-vllm/target/release/deps/libpulp-c51f68967478b6aa.rlib" "/candle-vllm/target/release/deps/libnum_complex-9293d6ad98d7b1c3.rlib" "/candle-vllm/target/release/deps/libdyn_stack-e01f3657ea7d975f.rlib" "/candle-vllm/target/release/deps/libreborrow-77659d577c4b718c.rlib" "/candle-vllm/target/release/deps/libraw_cpuid-b9cfe85e371d3083.rlib" "/candle-vllm/target/release/deps/librayon-7e6c7f8c76536947.rlib" "/candle-vllm/target/release/deps/librayon_core-2fef7474b3331466.rlib" "/candle-vllm/target/release/deps/libcrossbeam_deque-f3876680669c2c7d.rlib" "/candle-vllm/target/release/deps/libcrossbeam_epoch-d5f20c1ae49163b7.rlib" "/candle-vllm/target/release/deps/libmemoffset-b4fab92a5d1a5e30.rlib" "/candle-vllm/target/release/deps/libcrossbeam_utils-1d67d2d362ef675e.rlib" "/candle-vllm/target/release/deps/libeither-c016b57e73ba30c1.rlib" "/candle-vllm/target/release/deps/libbyteorder-8bf78fc69cf5b0a1.rlib" "/candle-vllm/target/release/deps/libhalf-82866db1aa6c7f3e.rlib" "/candle-vllm/target/release/deps/librand_distr-b111214f51586c69.rlib" "/candle-vllm/target/release/deps/libnum_traits-28ee9b33f1e53f29.rlib" "/candle-vllm/target/release/deps/libbytemuck-7eee2fa1f516b4ce.rlib" "/candle-vllm/target/release/deps/libactix_web-0a08fb87679df924.rlib" "/candle-vllm/target/release/deps/liburl-1bbf839f22bd1732.rlib" "/candle-vllm/target/release/deps/libidna-fb425d18121613f1.rlib" "/candle-vllm/target/release/deps/libunicode_normalization-7972d0be1c38ac31.rlib" "/candle-vllm/target/release/deps/libtinyvec-61debd23e06e16bf.rlib" "/candle-vllm/target/release/deps/libtinyvec_macros-f326b6a6f0ca8a7b.rlib" "/candle-vllm/target/release/deps/libunicode_bidi-9dc6f963fdeb5a21.rlib" "/candle-vllm/target/release/deps/libserde_urlencoded-9f88ee3d21b5ec1b.rlib" "/candle-vllm/target/release/deps/libform_urlencoded-3e169fc285508f2a.rlib" "/candle-vllm/target/release/deps/libserde_json-2daaa0f082f50c3a.rlib" "/candle-vllm/target/release/deps/libryu-8b05c69dcf279a6f.rlib" "/candle-vllm/target/release/deps/libactix_server-e79c728840296968.rlib" "/candle-vllm/target/release/deps/libactix_router-48a733d95bd3dd5e.rlib" "/candle-vllm/target/release/deps/libregex-c78c6a0d40f8f119.rlib" "/candle-vllm/target/release/deps/libregex_automata-3822bb291a95f096.rlib" "/candle-vllm/target/release/deps/libaho_corasick-6f9c3d032c4f562f.rlib" "/candle-vllm/target/release/deps/libregex_syntax-3dd804a409b2c545.rlib" "/candle-vllm/target/release/deps/libserde-23513cb3b07422f8.rlib" "/candle-vllm/target/release/deps/libcookie-30bd32d9b0d08b83.rlib" "/candle-vllm/target/release/deps/libtime-bc85cd6997494558.rlib" "/candle-vllm/target/release/deps/libtime_core-531fb2a2b6009484.rlib" "/candle-vllm/target/release/deps/libderanged-5409594f6406082d.rlib" "/candle-vllm/target/release/deps/libpowerfmt-c4543fc1903272c6.rlib" "/candle-vllm/target/release/deps/libactix_http-f7b0baf59fd7bb10.rlib" "/candle-vllm/target/release/deps/librand-aa6ddb6627b48b96.rlib" "/candle-vllm/target/release/deps/librand_chacha-fa47a10cc5e59439.rlib" "/candle-vllm/target/release/deps/libppv_lite86-9a645f708eed4e1c.rlib" "/candle-vllm/target/release/deps/librand_core-479671a2b8263665.rlib" "/candle-vllm/target/release/deps/libhttparse-699e93ce2c2e7905.rlib" "/candle-vllm/target/release/deps/libbrotli-df4299509820f939.rlib" "/candle-vllm/target/release/deps/libbrotli_decompressor-0212e4cdb0da1245.rlib" "/candle-vllm/target/release/deps/liballoc_stdlib-fc777d5f3c59a235.rlib" "/candle-vllm/target/release/deps/liballoc_no_stdlib-f497a54db348ea9b.rlib" "/candle-vllm/target/release/deps/libhttpdate-5f8e81ac577420b0.rlib" "/candle-vllm/target/release/deps/libsha1-ad6469ba6b8b2240.rlib" "/candle-vllm/target/release/deps/libcpufeatures-dcef25221428931f.rlib" "/candle-vllm/target/release/deps/libdigest-f32a2ccccbd945ab.rlib" "/candle-vllm/target/release/deps/libsubtle-910e19b9d08b2799.rlib" "/candle-vllm/target/release/deps/libblock_buffer-2ad0dde06bca4c37.rlib" "/candle-vllm/target/release/deps/libcrypto_common-30c46997c474a2db.rlib" "/candle-vllm/target/release/deps/libgeneric_array-95ff38f8e6dc2014.rlib" "/candle-vllm/target/release/deps/libtypenum-ddf8574aa94ffabe.rlib" "/candle-vllm/target/release/deps/libbase64-daaf16d87f9b4835.rlib" "/candle-vllm/target/release/deps/liblocal_channel-5501da97fbe12c8a.rlib" "/candle-vllm/target/release/deps/libbytestring-4d1e0f611bab987e.rlib" "/candle-vllm/target/release/deps/libencoding_rs-c048082deb3a71c3.rlib" "/candle-vllm/target/release/deps/liblanguage_tags-e0dfc52f86f9b27a.rlib" "/candle-vllm/target/release/deps/libahash-a28674307e9664ad.rlib" "/candle-vllm/target/release/deps/libgetrandom-b24cab7002c3530b.rlib" "/candle-vllm/target/release/deps/libzerocopy-63825396d720b9a6.rlib" "/candle-vllm/target/release/deps/libmime-04e6f00618993e67.rlib" "/candle-vllm/target/release/deps/libpercent_encoding-d54414372a2980de.rlib" "/candle-vllm/target/release/deps/libh2-27cdaea5e3d2147c.rlib" "/candle-vllm/target/release/deps/libindexmap-fcdde0ade0e1bfe3.rlib" "/candle-vllm/target/release/deps/libequivalent-8a25e166243cfe94.rlib" "/candle-vllm/target/release/deps/libhashbrown-aee95c0614bccf63.rlib" "/candle-vllm/target/release/deps/libfutures_util-98b8b67b3d434750.rlib" "/candle-vllm/target/release/deps/libfutures_io-bbce8973c99e7ece.rlib" "/candle-vllm/target/release/deps/libslab-490ef311b9a84e0e.rlib" "/candle-vllm/target/release/deps/libfutures_channel-6d294bf595dec06a.rlib" "/candle-vllm/target/release/deps/libfutures_task-0a7c23a0933dbcaa.rlib" "/candle-vllm/target/release/deps/libpin_utils-185c55cbe9ca2fff.rlib" "/candle-vllm/target/release/deps/libbitflags-1029aec9c38cde73.rlib" "/candle-vllm/target/release/deps/libzstd-242538c7759a4fa6.rlib" "/candle-vllm/target/release/deps/libzstd_safe-d25e92a1d04503ec.rlib" "/candle-vllm/target/release/deps/libzstd_sys-a6ec9cf883e86b56.rlib" "/candle-vllm/target/release/deps/libflate2-b67596bfbb64de8d.rlib" "/candle-vllm/target/release/deps/libminiz_oxide-2b969af90226827f.rlib" "/candle-vllm/target/release/deps/libsimd_adler32-d1dbd8e6b06bf162.rlib" "/candle-vllm/target/release/deps/libcrc32fast-ceb628e76fc0bab0.rlib" "/candle-vllm/target/release/deps/libactix_service-dfc20131f5ba36d4.rlib" "/candle-vllm/target/release/deps/libactix_codec-f3cae536aed1196d.rlib" "/candle-vllm/target/release/deps/libtokio_util-88b2eabf4483c1ed.rlib" "/candle-vllm/target/release/deps/libtracing-9e7a6177765350ac.rlib" "/candle-vllm/target/release/deps/libtracing_core-c5e9157560beafe6.rlib" "/candle-vllm/target/release/deps/libonce_cell-4b31816a5aa6274f.rlib" "/candle-vllm/target/release/deps/libmemchr-38d4fc2a3522aa15.rlib" "/candle-vllm/target/release/deps/libfutures_sink-78114cacf22202c2.rlib" "/candle-vllm/target/release/deps/libbitflags-b9815c55ec510696.rlib" "/candle-vllm/target/release/deps/libactix_utils-ec862be5af373362.rlib" "/candle-vllm/target/release/deps/liblocal_waker-7857496d2dec9a57.rlib" "/candle-vllm/target/release/deps/libactix_rt-0ffc3a15823d1322.rlib" "/candle-vllm/target/release/deps/libtokio-b67279acab90ede3.rlib" "/candle-vllm/target/release/deps/libsignal_hook_registry-a773ced30481d3cb.rlib" "/candle-vllm/target/release/deps/libnum_cpus-fbaf57124b2a0166.rlib" "/candle-vllm/target/release/deps/libsocket2-8e37cfa1c7015c6b.rlib" "/candle-vllm/target/release/deps/libmio-81de974463968f98.rlib" "/candle-vllm/target/release/deps/liblog-35f97248cb2ec82c.rlib" "/candle-vllm/target/release/deps/libparking_lot-e183fcd4a13bd183.rlib" "/candle-vllm/target/release/deps/libparking_lot_core-5fbb54b30e35e540.rlib" "/candle-vllm/target/release/deps/liblibc-d38dc52f94735460.rlib" "/candle-vllm/target/release/deps/libcfg_if-88c619515d65e3f1.rlib" "/candle-vllm/target/release/deps/libsmallvec-e35ec471a6514672.rlib" "/candle-vllm/target/release/deps/liblock_api-920512de5989abb2.rlib" "/candle-vllm/target/release/deps/libscopeguard-6208b4062bcdc2b1.rlib" "/candle-vllm/target/release/deps/libpin_project_lite-42a553ee08f02ebb.rlib" "/candle-vllm/target/release/deps/libfutures_core-b87582f06d7f1343.rlib" "/candle-vllm/target/release/deps/libhttp-b738399ec4ab1c60.rlib" "/candle-vllm/target/release/deps/libitoa-dcbca83b54db3306.rlib" "/candle-vllm/target/release/deps/libbytes-8c2bf1b211f72910.rlib" "/candle-vllm/target/release/deps/libfnv-ffe196e20ea2a648.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-9c342d6596ca77d8.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-35e6faa0abf08dd1.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-6242b5524a2684de.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libmemchr-94511439d510df36.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-1923a594ddedab24.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-5b476927cd520d76.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-6b4664d28b4dc07b.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-4d7e14ee42b44abc.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-94e04d08d317eb2b.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-7e3a1db27b23a8ee.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-0651af3c34a1e4b9.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-e5da8ecb95d2de36.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-052b86aa844a2857.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-bbd2a157557b773d.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-f47279717d0e1831.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-d30e243a979711ec.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-18929aabe36e3f57.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-f9f41fbdedfbfafb.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-b26982894e484f03.rlib" "-Wl,-Bdynamic" "-lssl" "-lcrypto" "-lflashattention" "-lcudart" "-lstdc++" "-lstdc++" "-lcuda" "-lnccl" "-lnvrtc" "-lcurand" "-lcublas" "-lcublasLt" "-lcudnn" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/candle-vllm/target/release/deps/candle_vllm-8b71aad931b633bb" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs"
= note: /usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-bb54a4d16d25ee03/out/libflashattention.a(flash_api.o): relocation R_X86_64_32 against `.nvFatBinSegment' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-bb54a4d16d25ee03/out/libflashattention.a(flash_fwd_hdim128_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi128ELi128ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi128ELi128ELi64ELi4ES2_EELb0ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
etc...
I will try with cargo update again but need to go so I will let you know tomorrow probably. Thanks!!!
Strangely with cargo update
and the latest git from this repo it failed again on the linking phase.
Some extracts from the new Cargo.toml:
name = "candle-flash-attn"
version = "0.3.3"
source = "git+https://github.com/huggingface/candle.git#9e824ec810fbe490f21b7404058b6cb47d24c6cf"
name = "candle-lora"
version = "0.2.0"
source = "git+https://github.com/EricLBuehler/candle-lora.git#bb518c14dc15e322288f64fb2158e44f49cc3369"
Using FROM nvidia/cuda:12.3.1-devel-rockylinux8
(Rocky 8 instead of 9) failed too in the same way during link time.
Ok, this seems like a general linking problem. I'll try to reproduce it tonight, as I plan on working on the CUDA kernels.
Based on your VSCode build container Dockerfile in https://github.com/EricLBuehler/candle-vllm/blob/master/Dockerfile I run this commands exactly and it worked:
docker run --rm -it pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel bash -i
apt-get update
apt-get install -y \
build-essential \
git \
curl \
openssl \
libssl-dev \
pkg-config \
wget
curl https://sh.rustup.rs -sSf | bash -s -- -y && \
echo 'source $HOME/.cargo/env' >> $HOME/.bashrc && \
source $HOME/.bashrc
git clone https://github.com/EricLBuehler/candle-vllm
cd candle-vllm
export CUDA_COMPUTE_CAP=86
cargo build --release --features cuda,cudnn,flash-attn,nccl
That tag is old, based in Ubuntu 20.04.6 and the older Pytorch 2.1.2. I will try with pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel and see, thanks!
Apologies for the confusion; candle-vllm is not meant to be built with Pytorch, as we use Candle. It should not require a docker file at all. Instead, please follow the README instructions.
I know but I used the devcontainer in your repo as base and it worked, so I guess I can try different (older probably) versions, etc. and make it work in a smaller container without PyTorch.
I tried to run it with HF_TOKEN=xxxx target/release/candle-vllm --hf-token HF_TOKEN --port 8080 llama7b --repeat-last-n 4096
but it failed with:
Error: APIError { data: "request error: https://huggingface.co/meta-llama/Llama-27b-chat-hf/resolve/main/tokenizer.json: status code 404" }
A possible alternative could be:
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/raw/main/tokenizer.json
But anyway maybe this is already fixed by a cargo update which I forgot to run in this case.
If it's already fixed then maybe a new Cargo.lock
commit should be done to avoid doing the cargo update
?
Thanks for all your help Eric!!!
Adding dnf upgrade it fails differently, this is the current version:
FROM nvidia/cuda:12.3.1-devel-rockylinux9
ARG USERID=1000
ARG CUDA_COMPUTE_CAP
RUN dnf upgrade -y && dnf clean all && rm -rf /var/cache/dnf/*
RUN dnf install -y cargo libcudnn8-devel openssl-devel git && dnf clean all && \
rm -rf /var/cache/dnf/*
RUN git clone https://github.com/EricLBuehler/candle-vllm
WORKDIR /candle-vllm
RUN cargo update
#RUN cargo build --release --features cuda,cudnn,flash-attn,nccl
RUN cargo install --path . --features cuda,cudnn,flash-attn,nccl
RUN adduser -u $USERID user
USER user
ENTRYPOINT ["/candle-vllm/target/release/candle-vllm"]
CMD ["--hf-token", "HF_TOKEN", "--port", "8080", "llama7b", "--repeat-last-n", "4096"]
And can be built with:
docker build --build-arg USERID=$(id -u) --build-arg \
CUDA_COMPUTE_CAP=$(nvidia-smi --query-gpu=compute_cap --format=csv | tail -n1 | tr -d .) \
-t local/candle-vllm .
So the current failure is this:
error[E0412]: cannot find type `Tensor` in this scope
--> src/backend/mod.rs:61:38
|
61 | fn dispatch_get_cuda_pointer(tensor: Tensor) -> u64 {
| ^^^^^^ not found in this scope
|
help: consider importing this struct
|
80 + use candle_core::Tensor;
|
error[E0412]: cannot find type `bf16` in this scope
--> src/backend/mod.rs:63:43
|
63 | DType::BF16 => get_cuda_pointer::<bf16>(tensor),
| ^^^^ not found in this scope
|
help: consider importing this struct
|
80 + use half::bf16;
|
error[E0412]: cannot find type `f16` in this scope
--> src/backend/mod.rs:64:42
|
64 | DType::F16 => get_cuda_pointer::<f16>(tensor),
| ^^^
|
help: a builtin type with a similar name exists
|
64 | DType::F16 => get_cuda_pointer::<i16>(tensor),
| ~~~
help: consider importing this struct
|
80 + use half::f16;
|
error[E0405]: cannot find trait `CudaDType` in this scope
--> src/backend/mod.rs:73:24
|
73 | fn get_cuda_pointer<T: CudaDType>(tensor: Tensor) -> u64 {
| ^^^^^^^^^ not found in this scope
|
help: consider importing this trait
|
80 + use candle_core::cuda_backend::CudaDType;
|
error[E0412]: cannot find type `Tensor` in this scope
--> src/backend/mod.rs:73:43
|
73 | fn get_cuda_pointer<T: CudaDType>(tensor: Tensor) -> u64 {
| ^^^^^^ not found in this scope
|
help: consider importing this struct
|
80 + use candle_core::Tensor;
|
error[E0433]: failed to resolve: use of undeclared type `Storage`
--> src/backend/mod.rs:75:9
|
75 | Storage::Cuda(cuda_storage) => *cuda_storage.as_cuda_slice::<T>().unwrap().device_ptr(),
| ^^^^^^^ use of undeclared type `Storage`
|
help: consider importing this enum
|
80 + use candle_core::Storage;
|
warning: unused imports: `bf16`, `f16`
--> src/backend/cache.rs:10:12
|
10 | use half::{bf16, f16};
| ^^^^ ^^^
|
= note: `#[warn(unused_imports)]` on by default
Some errors have detailed explanations: E0405, E0412, E0433.
For more information about an error, try `rustc --explain E0405`.
warning: `candle-vllm` (lib) generated 1 warning
error: could not compile `candle-vllm` (lib) due to 6 previous errors; 1 warning emitted
warning: build failed, waiting for other jobs to finish...
error: failed to compile `candle-vllm v0.1.0 (/candle-vllm)`, intermediate artifacts can be found at `/candle-vllm/target`
I will try to instead of using rustc 1.71.1 from Rocky 9 to install the latest version:
curl https://sh.rustup.rs -sSf | bash -s -- -y && \
echo 'source $HOME/.cargo/env' >> $HOME/.bashrc && \
source $HOME/.bashrc
Have a good weekend!
I just pushed a commit which should fix this, could you try to build again?
Hello Eric. Thanks for the updates and sorry for the delay. Unfortunately it still doesn't compile because of a different error this time:
error: failed to run custom build command for `candle-vllm v0.1.0 (/candle-vllm)`
Caused by:
process didn't exit successfully: `/candle-vllm/target/release/build/candle-vllm-2ea58adaa28146b1/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
cargo:rustc-env=CUDA_COMPUTE_CAP=86
--- stderr
kernels/rotary_embedding_kernel.cu(81): error: identifier "scalar_t" is undefined
scalar_t* __restrict__ query,
^
kernels/rotary_embedding_kernel.cu(82): error: identifier "scalar_t" is undefined
scalar_t* __restrict__ key,
^
kernels/rotary_embedding_kernel.cu(83): error: identifier "scalar_t" is undefined
const scalar_t* __restrict__ cos_sin_cache,
^
kernels/rotary_embedding_kernel.cu(90): error: identifier "scalar_t" is undefined
rotary_embedding_kernel<scalar_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(95): error: identifier "scalar_t" is undefined
scalar_t* __restrict__ query,
^
kernels/rotary_embedding_kernel.cu(96): error: identifier "scalar_t" is undefined
scalar_t* __restrict__ key,
^
kernels/rotary_embedding_kernel.cu(97): error: identifier "scalar_t" is undefined
const scalar_t* __restrict__ cos_sin_cache,
^
kernels/rotary_embedding_kernel.cu(104): error: identifier "scalar_t" is undefined
rotary_embedding_kernel<scalar_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
8 errors detected in the compilation of "kernels/rotary_embedding_kernel.cu".
thread 'main' panicked at '"nvcc" "--gpu-architecture=sm_86" "--ptx" "--use_fast_math" "-std=c++17" "-O" "2" "--default-stream" "per-thread" "--output-directory" "kernels/" "kernels/rotary_embedding_kernel.cu" failed with exit code exit status: 1', build.rs:65:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Let me know if further tests or variations could be useful, thanks!
This is an error related to the fact we need to manually monomorphize the kernel function. I have since pushed the changes, could you try it again?
Thanks!
I just did git pull
and run cargo install ...
again and it failed with this errors now:
error: failed to run custom build command for `candle-vllm v0.1.0 (/candle-vllm)`
Caused by:
process didn't exit successfully: `/candle-vllm/target/release/build/candle-vllm-2ea58adaa28146b1/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
cargo:rustc-env=CUDA_COMPUTE_CAP=86
--- stderr
kernels/rotary_embedding_kernel.cu(91): error: a __global__ function call must be configured
rotary_embedding_kernel<uint8_t, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(105): error: a __global__ function call must be configured
rotary_embedding_kernel<uint32_t, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(119): error: a __global__ function call must be configured
rotary_embedding_kernel<int64_t, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(133): error: a __global__ function call must be configured
rotary_embedding_kernel<float, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(147): error: a __global__ function call must be configured
rotary_embedding_kernel<double, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(161): error: a __global__ function call must be configured
rotary_embedding_kernel<int16_t, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(175): error: a __global__ function call must be configured
rotary_embedding_kernel<int16_t, false>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(190): error: a __global__ function call must be configured
rotary_embedding_kernel<uint8_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(204): error: a __global__ function call must be configured
rotary_embedding_kernel<uint32_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(218): error: a __global__ function call must be configured
rotary_embedding_kernel<int64_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(232): error: a __global__ function call must be configured
rotary_embedding_kernel<float, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(246): error: a __global__ function call must be configured
rotary_embedding_kernel<double, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(260): error: a __global__ function call must be configured
rotary_embedding_kernel<int16_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
kernels/rotary_embedding_kernel.cu(274): error: a __global__ function call must be configured
rotary_embedding_kernel<int16_t, true>(positions, query, key, cos_sin_cache, rot_dim, query_stride, key_stride, num_heads, num_kv_heads, head_size);
^
14 errors detected in the compilation of "kernels/rotary_embedding_kernel.cu".
thread 'main' panicked at '"nvcc" "--gpu-architecture=sm_86" "--ptx" "--use_fast_math" "-std=c++17" "-O" "2" "--default-stream" "per-thread" "--output-directory" "kernels/" "kernels/rotary_embedding_kernel.cu" failed with exit code exit status: 1', build.rs:65:13
Ok, I just pushed a change to hopefully fix that. Could you try it again?
Thanks, we got progress!, but later it failed with this:
Compiling candle-vllm v0.1.0 (/candle-vllm)
error[E0433]: failed to resolve: use of undeclared type `Device`
--> src/backend/layers.rs:20:9
|
20 | let Device::Cuda(dev) = positions_dev else {
| ^^^^^^ use of undeclared type `Device`
|
help: consider importing this enum
|
1 + use candle_core::Device;
|
error[E0433]: failed to resolve: use of undeclared type `DType`
--> src/backend/layers.rs:24:29
|
24 | if positions.dtype() != DType::I64 {
| ^^^^^ use of undeclared type `DType`
|
help: consider importing one of these items
|
1 + use candle_core::DType;
|
1 + use crate::backend::DType;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/backend/layers.rs:25:20
|
25 | return Err(APIError::new(format!(
| ^^^^^^^^ use of undeclared type `APIError`
|
help: consider importing this struct
|
1 + use crate::openai::responses::APIError;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/backend/layers.rs:32:20
|
32 | return Err(APIError::new(format!(
| ^^^^^^^^ use of undeclared type `APIError`
|
help: consider importing this struct
|
1 + use crate::openai::responses::APIError;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/backend/layers.rs:40:20
|
40 | return Err(APIError::new(format!(
| ^^^^^^^^ use of undeclared type `APIError`
|
help: consider importing this struct
|
1 + use crate::openai::responses::APIError;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/backend/layers.rs:48:20
|
48 | return Err(APIError::new(format!(
| ^^^^^^^^ use of undeclared type `APIError`
|
help: consider importing this struct
|
1 + use crate::openai::responses::APIError;
|
error[E0422]: cannot find struct, variant or union type `LaunchConfig` in this scope
--> src/backend/layers.rs:62:23
|
62 | let launch_conf = LaunchConfig {
| ^^^^^^^^^^^^ not found in this scope
|
help: consider importing this struct
|
1 + use cudarc::driver::LaunchConfig;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/openai/responses.rs:38:28
|
38 | return Err(APIError::from(e));
| ^^^^^^^^ use of undeclared type `APIError`
|
::: src/backend/layers.rs:77:18
|
77 | let stream = try_api!(dev.fork_default_stream());
| ----------------------------------- in this macro invocation
|
= note: this error originates in the macro `try_api` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
--> src/backend/layers.rs:1:1
|
1 + use crate::openai::responses::APIError;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/openai/responses.rs:38:28
|
38 | return Err(APIError::from(e));
| ^^^^^^^^ use of undeclared type `APIError`
|
::: src/backend/layers.rs:80:9
|
80 | / try_api!(get_or_load_func(
81 | | ROTARY_EMBDEDDING_PTX,
82 | | ROTARY_EMBDEDDING_KERNEL,
83 | | query.dtype(),
84 | | Some("_neox"),
85 | | dev
86 | | ))
| |__________- in this macro invocation
|
= note: this error originates in the macro `try_api` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
--> src/backend/layers.rs:1:1
|
1 + use crate::openai::responses::APIError;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/openai/responses.rs:38:28
|
38 | return Err(APIError::from(e));
| ^^^^^^^^ use of undeclared type `APIError`
|
::: src/backend/layers.rs:88:9
|
88 | / try_api!(get_or_load_func(
89 | | ROTARY_EMBDEDDING_PTX,
90 | | ROTARY_EMBDEDDING_KERNEL,
91 | | query.dtype(),
92 | | None,
93 | | dev
94 | | ))
| |__________- in this macro invocation
|
= note: this error originates in the macro `try_api` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
--> src/backend/layers.rs:1:1
|
1 + use crate::openai::responses::APIError;
|
error[E0433]: failed to resolve: use of undeclared type `APIError`
--> src/openai/responses.rs:38:28
|
38 | return Err(APIError::from(e));
| ^^^^^^^^ use of undeclared type `APIError`
|
::: src/backend/layers.rs:97:5
|
97 | / try_api!(unsafe {
98 | | kernel.launch_on_stream(
99 | | &stream,
100 | | launch_conf,
... |
113 | | )
114 | | });
| |______- in this macro invocation
|
= note: this error originates in the macro `try_api` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing this struct
--> src/backend/layers.rs:1:1
|
1 + use crate::openai::responses::APIError;
|
warning: unused import: `either::Either`
--> src/backend/cache.rs:10:5
|
10 | use either::Either;
| ^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused imports: `bf16`, `f16`
--> src/backend/cache.rs:11:12
|
11 | use half::{bf16, f16};
| ^^^^ ^^^
warning: unused import: `either::Either`
--> src/backend/layers.rs:2:5
|
2 | use either::Either;
| ^^^^^^^^^^^^^^
error[E0308]: arguments to this function are incorrect
--> src/backend/cache.rs:118:27
|
118 | let kernel = try_api!(get_or_load_func(
| ^^^^^^^^^^^^^^^^
...
121 | None,
| ---- expected `DType`, found `std::option::Option<_>`
122 | key.dtype(),
| ----------- expected `std::option::Option<&str>`, found `DType`
|
note: function defined here
--> src/backend/mod.rs:17:8
|
17 | pub fn get_or_load_func(
| ^^^^^^^^^^^^^^^^
18 | ptx_file: &'static str,
| ----------------------
19 | kernel_base: &str,
| -----------------
20 | dtype: DType,
| ------------
21 | suffix: Option<&str>,
| --------------------
22 | device: &CudaDevice,
| -------------------
help: swap these arguments
|
118 | let kernel = try_api!(get_or_load_func(RESHAPE_AND_CACHE_PTX, RESHAPE_AND_CACHE_KERNEL, key.dtype(), None, dev));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error[E0308]: arguments to this function are incorrect
--> src/backend/cache.rs:250:27
|
250 | let kernel = try_api!(get_or_load_func(
| ^^^^^^^^^^^^^^^^
...
253 | None,
| ---- expected `DType`, found `std::option::Option<_>`
254 | key_caches.first().unwrap().dtype(),
| ----------------------------------- expected `std::option::Option<&str>`, found `DType`
|
note: function defined here
--> src/backend/mod.rs:17:8
|
17 | pub fn get_or_load_func(
| ^^^^^^^^^^^^^^^^
18 | ptx_file: &'static str,
| ----------------------
19 | kernel_base: &str,
| -----------------
20 | dtype: DType,
| ------------
21 | suffix: Option<&str>,
| --------------------
22 | device: &CudaDevice,
| -------------------
help: swap these arguments
|
250 | let kernel = try_api!(get_or_load_func(COPY_BLOCKS_PTX, COPY_BLOCKS_KERNEL, key_caches.first().unwrap().dtype(), None, dev));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error[E0308]: mismatched types
--> src/backend/layers.rs:73:45
|
73 | let key_ptr = dispatch_get_cuda_pointer(key);
| ------------------------- ^^^ expected `Tensor`, found `&mut Tensor`
| |
| arguments to this function are incorrect
|
note: function defined here
--> src/backend/mod.rs:70:4
|
70 | fn dispatch_get_cuda_pointer(tensor: Tensor) -> u64 {
| ^^^^^^^^^^^^^^^^^^^^^^^^^ --------------
error[E0308]: mismatched types
--> src/backend/layers.rs:74:47
|
74 | let query_ptr = dispatch_get_cuda_pointer(query);
| ------------------------- ^^^^^ expected `Tensor`, found `&mut Tensor`
| |
| arguments to this function are incorrect
|
note: function defined here
--> src/backend/mod.rs:70:4
|
70 | fn dispatch_get_cuda_pointer(tensor: Tensor) -> u64 {
| ^^^^^^^^^^^^^^^^^^^^^^^^^ --------------
error[E0599]: no method named `launch_on_stream` found for struct `CudaFunction` in the current scope
--> src/backend/layers.rs:98:16
|
98 | kernel.launch_on_stream(
| -------^^^^^^^^^^^^^^^^ method not found in `CudaFunction`
|
::: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.10.0/src/driver/safe/launch.rs:202:15
|
202 | unsafe fn launch_on_stream(
| ---------------- the method is available for `CudaFunction` here
|
= help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
1 + use candle_core::cuda_backend::cudarc::driver::LaunchAsync;
|
error[E0308]: mismatched types
--> src/backend/paged_attention.rs:15:6
|
4 | pub fn paged_attention_v1(
| ------------------ implicitly returns `()` as its body has no tail or `return` expression
...
15 | ) -> Tensor {
| ^^^^^^ expected `Tensor`, found `()`
error[E0308]: mismatched types
--> src/backend/mod.rs:25:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
25 | Either::Left(DType::U8) => "_u8",
| ^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:26:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
25 | Either::Left(DType::U8) => "_u8",
26 | Either::Left(DType::U32) => "_u32",
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:27:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
...
27 | Either::Left(DType::I64) => "_i64",
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:28:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
...
28 | Either::Left(DType::BF16) => "_bf16",
| ^^^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:29:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
...
29 | Either::Left(DType::F16) => "_f16",
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:30:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
...
30 | Either::Left(DType::F32) => "_f32",
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:31:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
...
31 | Either::Left(DType::F64) => "_f64",
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0308]: mismatched types
--> src/backend/mod.rs:32:9
|
24 | let mut spec = match dtype {
| ----- this expression has type `DType`
...
32 | Either::Right(data) => data,
| ^^^^^^^^^^^^^^^^^^^ expected `DType`, found `Either<_, _>`
|
= note: expected enum `DType`
found enum `either::Either<_, _>`
error[E0369]: cannot add `&str` to `&str`
--> src/backend/mod.rs:35:21
|
35 | spec = spec + suffix;
| ---- ^ ------ &str
| | |
| | `+` cannot be used to concatenate two `&str` strings
| &str
|
= note: string concatenation requires an owned `String` on the left
help: create an owned `String` from a string reference
|
35 | spec = spec.to_owned() + suffix;
| +++++++++++
error[E0599]: no method named `device_ptr` found for reference `&CudaSlice<T>` in the current scope
--> src/backend/mod.rs:84:84
|
84 | Storage::Cuda(cuda_storage) => *cuda_storage.as_cuda_slice::<T>().unwrap().device_ptr(),
| ^^^^^^^^^^
|
= help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
1 + use candle_core::cuda_backend::cudarc::driver::DevicePtr;
|
help: there is a method with a similar name
|
84 | Storage::Cuda(cuda_storage) => *cuda_storage.as_cuda_slice::<T>().unwrap().device(),
| ~~~~~~
error[E0308]: mismatched types
--> src/paged_attention/mod.rs:92:17
|
88 | paged_attention_v1(
| ------------------ arguments to this function are incorrect
...
92 | self.num_key_value_heads,
| ^^^^^^^^^^^^^^^^^^^^^^^^ expected `i32`, found `usize`
|
note: function defined here
--> src/backend/paged_attention.rs:4:8
|
4 | pub fn paged_attention_v1(
| ^^^^^^^^^^^^^^^^^^
...
8 | num_key_value_heads: i32, // [num_heads]
| ------------------------
help: you can convert a `usize` to an `i32` and panic if the converted value doesn't fit
|
92 | self.num_key_value_heads.try_into().unwrap(),
| ++++++++++++++++++++
error[E0609]: no field `head_mapping` on type `&mut PagedAttention`
--> src/paged_attention/mod.rs:117:22
|
117 | self.head_mapping.clone(),
| ^^^^^^^^^^^^ unknown field
|
= note: available fields are: `num_attention_heads`, `head_dim`, `num_key_value_heads`, `scale`, `sliding_window` ... and 2 others
warning: unused import: `CudaDType`
--> src/backend/cache.rs:6:9
|
6 | CudaDType,
| ^^^^^^^^^
Some errors have detailed explanations: E0308, E0369, E0422, E0433, E0599, E0609.
For more information about an error, try `rustc --explain E0308`.
warning: `candle-vllm` (lib) generated 4 warnings
error: could not compile `candle-vllm` (lib) due to 29 previous errors; 4 warnings emitted
error: failed to compile `candle-vllm v0.1.0 (/candle-vllm)`, intermediate artifacts can be found at `/candle-vllm/target`
Ok, (again) I just pushed what is hopefully some fixes. Could you try it again?
Progress again! These are the errors now:
warning: unused import: `either::Either`
--> src/backend/cache.rs:10:5
|
10 | use either::Either;
| ^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `either::Either`
--> src/backend/mod.rs:98:5
|
98 | use either::Either;
| ^^^^^^^^^^^^^^
error[E0308]: mismatched types
--> src/backend/layers.rs:28:16
|
21 | ) {
| - help: a return type might be missing here: `-> _`
...
28 | return Err(APIError::new(format!(
| ________________^
29 | | "`positions` has {:?} type, expected I64 type.",
30 | | positions.dtype()
31 | | )));
| |___________^ expected `()`, found `Result<_, APIError>`
|
= note: expected unit type `()`
found enum `Result<_, APIError>`
error[E0277]: the trait bound `&usize: DeviceRepr` is not satisfied
--> src/backend/layers.rs:104:13
|
101 | kernel.launch_on_stream(
| ---------------- required by a bound introduced by this call
...
104 | / (
105 | | positions_ptr,
106 | | query_ptr,
107 | | key_ptr,
... |
114 | | head_size,
115 | | ),
| |_____________^ the trait `DeviceRepr` is not implemented for `&usize`
|
= help: the trait `DeviceRepr` is implemented for `usize`
= note: required for `CudaFunction` to implement `LaunchAsync<(u64, u64, u64, u64, &usize, &usize, &usize, usize, usize, usize)>`
warning: unused import: `CudaDType`
--> src/backend/cache.rs:6:9
|
6 | CudaDType,
| ^^^^^^^^^
Some errors have detailed explanations: E0277, E0308.
For more information about an error, try `rustc --explain E0277`.
warning: `candle-vllm` (lib) generated 3 warnings
error: could not compile `candle-vllm` (lib) due to 2 previous errors; 3 warnings emitted
error: failed to compile `candle-vllm v0.1.0 (/candle-vllm)`, intermediate artifacts can be found at `/candle-vllm/target`
Ok, the commit I just pushed should fix that. Could you try it again?
Down to one error now!!! Good job!!!
error[E0614]: type `usize` cannot be dereferenced
--> src/backend/layers.rs:114:17
|
114 | *head_size,
| ^^^^^^^^^^
For more information about this error, try `rustc --explain E0614`.
Thanks, just pushed one more that should iron that out.
Thks, new set of errors:
Compiling candle-vllm v0.1.0 (/candle-vllm)
warning: unused variable: `src_dev`
--> src/backend/cache.rs:313:23
|
313 | (Device::Cuda(src_dev), Device::Cpu) => {
| ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_src_dev`
|
= note: `#[warn(unused_variables)]` on by default
error[E0505]: cannot move out of `positions` because it is borrowed
--> src/backend/layers.rs:75:51
|
15 | positions: Tensor,
| --------- binding `positions` declared here
...
22 | let positions_dev = positions.device();
| ------------------ borrow of `positions` occurs here
...
75 | let positions_ptr = dispatch_get_cuda_pointer(positions);
| ^^^^^^^^^ move out of `positions` occurs here
...
80 | let stream = try_api!(dev.fork_default_stream());
| ------------------------- borrow later used here
error[E0505]: cannot move out of `cos_sin_cache` because it is borrowed
--> src/backend/layers.rs:78:55
|
19 | cos_sin_cache: Tensor,
| ------------- binding `cos_sin_cache` declared here
...
59 | let rot_dim = cos_sin_cache.shape().dims().get(1).unwrap();
| --------------------- borrow of `cos_sin_cache` occurs here
...
78 | let cos_sin_cache_ptr = dispatch_get_cuda_pointer(cos_sin_cache);
| ^^^^^^^^^^^^^ move out of `cos_sin_cache` occurs here
...
109 | *rot_dim,
| -------- borrow later used here
warning: variable does not need to be mutable
--> src/backend/mod.rs:24:9
|
24 | let mut spec = match dtype {
| ----^^^^
| |
| help: remove this `mut`
|
= note: `#[warn(unused_mut)]` on by default
For more information about this error, try `rustc --explain E0505`.
warning: `candle-vllm` (lib) generated 2 warnings
error: could not compile `candle-vllm` (lib) due to 2 previous errors; 2 warnings emitted
Ok, could you try it again?
Progress! Now these:
warning: unused variable: `src_dev`
--> src/backend/cache.rs:313:23
|
313 | (Device::Cuda(src_dev), Device::Cpu) => {
| ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_src_dev`
|
= note: `#[warn(unused_variables)]` on by default
error[E0716]: temporary value dropped while borrowed
--> src/backend/layers.rs:59:19
|
59 | let rot_dim = cos_sin_cache.shape().clone().dims().get(1).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary value which is freed while still in use
...
68 | 512.min((num_heads * rot_dim / 2).try_into().unwrap()),
| ------- borrow later used here
|
help: consider using a `let` binding to create a longer lived value
|
59 ~ let binding = cos_sin_cache.shape().clone();
60 ~ let rot_dim = binding.dims().get(1).unwrap();
|
For more information about this error, try `rustc --explain E0716`.
warning: `candle-vllm` (lib) generated 1 warning
error: could not compile `candle-vllm` (lib) due to previous error; 1 warning emitted
Ok, could you try it again?
Now that part compiled with some warnings, but it compiled.
But now the linking issue with candle-flash-attn reappeared.
To recap: this is based on nvidia/cuda:12.3.1-devel-rockylinux9
with dnf upgrade
and cargo update
.
These are the warnings:
Compiling candle-vllm v0.1.0 (/candle-vllm)
warning: unused variable: `src_dev`
--> src/backend/cache.rs:313:23
|
313 | (Device::Cuda(src_dev), Device::Cpu) => {
| ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_src_dev`
|
= note: `#[warn(unused_variables)]` on by default
warning: unused `Result` that must be used
--> src/openai/models/llama.rs:196:9
|
196 | / rotary_embedding(
197 | | positions,
198 | | q,
199 | | k,
... |
202 | | false,
203 | | );
| |_________^
|
= note: this `Result` may be an `Err` variant, which should be handled
= note: `#[warn(unused_must_use)]` on by default
help: use `let _ = ...` to ignore the resulting value
|
196 | let _ = rotary_embedding(
| +++++++
warning: `candle-vllm` (lib) generated 2 warnings (run `cargo fix --lib -p candle-vllm` to apply 1 suggestion)
And the linking errors:
error: linking with `cc` failed: exit status: 1
|
= note: LC_ALL="C" PATH="/usr/lib/rustlib/x86_64-unknown-linux-gnu/bin:/root/.local/bin:/root/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" VSLANG="1033" "cc" "-m64" "/tmp/rustc2owX73/symbols.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.0.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.1.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.10.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.11.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.12.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.13.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.14.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.15.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.2.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.3.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.4.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.5.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.6.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.7.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.8.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.candle_vllm.62ba636aee07a975-cgu.9.rcgu.o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d.1kk2vorh9x18px06.rcgu.o" "-Wl,--as-needed" "-L" "/candle-vllm/target/release/deps" "-L" "/candle-vllm/target/release/build/zstd-sys-dbf9b1574083ed21/out" "-L" "/usr/local/cuda/lib64" "-L" "/usr/local/cuda/lib64/stubs" "-L" "/usr/local/cuda/targets/x86_64-linux" "-L" "/usr/local/cuda/targets/x86_64-linux/lib" "-L" "/usr/local/cuda/targets/x86_64-linux/lib/stubs" "-L" "/usr/lib" "-L" "/usr/lib64" "-L" "/candle-vllm/target/release/build/ring-0546e065e7062d53/out" "-L" "/candle-vllm/target/release/build/esaxx-rs-be1982d2e341d29d/out" "-L" "/candle-vllm/target/release/build/onig_sys-4ce51a84b783f95f/out" "-L" "/candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out" "-L" "/usr/local/cuda/lib64" "-L" "/usr/local/cuda/lib64/stubs" "-L" "/usr/local/cuda/targets/x86_64-linux" "-L" "/usr/local/cuda/targets/x86_64-linux/lib" "-L" "/usr/local/cuda/targets/x86_64-linux/lib/stubs" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/candle-vllm/target/release/deps/libenv_logger-2d0b3827bd002e31.rlib" "/candle-vllm/target/release/deps/libtermcolor-7fd0e5721205d087.rlib" "/candle-vllm/target/release/deps/libis_terminal-89ef26f9c10f9212.rlib" "/candle-vllm/target/release/deps/librustix-d43875b6cd8dec3a.rlib" "/candle-vllm/target/release/deps/liblinux_raw_sys-872daf025077fc20.rlib" "/candle-vllm/target/release/deps/libhumantime-1dc284c82c7f0559.rlib" "/candle-vllm/target/release/deps/libcandle_vllm-96f5dc2ed1fb7068.rlib" "/candle-vllm/target/release/deps/libchrono-3cac20379a81063e.rlib" "/candle-vllm/target/release/deps/libiana_time_zone-41f802048b72bf7b.rlib" "/candle-vllm/target/release/deps/libhf_hub-67a984a3bd095729.rlib" "/candle-vllm/target/release/deps/libdirs-9b2222ab35d3810d.rlib" "/candle-vllm/target/release/deps/libdirs_sys-093500c3463c7fa8.rlib" "/candle-vllm/target/release/deps/liboption_ext-3db96de540040126.rlib" "/candle-vllm/target/release/deps/libureq-a4b308eb4114e2ad.rlib" "/candle-vllm/target/release/deps/libwebpki_roots-ea5e9bb57328911f.rlib" "/candle-vllm/target/release/deps/librustls-bc255bbda3c5502d.rlib" "/candle-vllm/target/release/deps/libsubtle-910e19b9d08b2799.rlib" "/candle-vllm/target/release/deps/libwebpki-130c21a29ac4d059.rlib" "/candle-vllm/target/release/deps/libring-810800fbfa7b1a3d.rlib" "/candle-vllm/target/release/deps/libspin-a5bca8ced7fc453c.rlib" "/candle-vllm/target/release/deps/libuntrusted-766afbb3ef44c1d1.rlib" "/candle-vllm/target/release/deps/libzeroize-77a7da17bcae1046.rlib" "/candle-vllm/target/release/deps/librustls_pki_types-c6df937ba04d34cb.rlib" "/candle-vllm/target/release/deps/libhootbin-5825222cc042ea1c.rlib" "/candle-vllm/target/release/deps/libfastrand-fd92473d790916bb.rlib" "/candle-vllm/target/release/deps/libhoot-7a0d75c1644cd85e.rlib" "/candle-vllm/target/release/deps/libreqwest-418640b55678fdff.rlib" "/candle-vllm/target/release/deps/librustls_pemfile-eb847ff6658d990b.rlib" "/candle-vllm/target/release/deps/libhyper_tls-761834c86e0dfb0e.rlib" "/candle-vllm/target/release/deps/libipnet-ff268618cdf09b54.rlib" "/candle-vllm/target/release/deps/libtokio_native_tls-6a98809186b7d24f.rlib" "/candle-vllm/target/release/deps/libnative_tls-29f3830c65a1dd32.rlib" "/candle-vllm/target/release/deps/libopenssl_probe-e135bf478bd9e62b.rlib" "/candle-vllm/target/release/deps/libopenssl-b99171d686a87d4b.rlib" "/candle-vllm/target/release/deps/libforeign_types-434e4620cdd2963d.rlib" "/candle-vllm/target/release/deps/libforeign_types_shared-3cd91dddd8b3059a.rlib" "/candle-vllm/target/release/deps/libopenssl_sys-6b731b85090325a6.rlib" "/candle-vllm/target/release/deps/libhyper-47e527bc8bf78df1.rlib" "/candle-vllm/target/release/deps/libwant-199118bdcd481b61.rlib" "/candle-vllm/target/release/deps/libtry_lock-4868dbe5a9f104c8.rlib" "/candle-vllm/target/release/deps/libtower_service-7ef5e09e7ca0f321.rlib" "/candle-vllm/target/release/deps/libsync_wrapper-60d7b39caee1bf33.rlib" "/candle-vllm/target/release/deps/libhttp_body-56cdab78f4db29df.rlib" "/candle-vllm/target/release/deps/libcandle_lora_transformers-66ecfa37e9a12967.rlib" "/candle-vllm/target/release/deps/libtqdm-2d95da43c8637887.rlib" "/candle-vllm/target/release/deps/libcrossterm-2fca374d20f41b01.rlib" "/candle-vllm/target/release/deps/libsignal_hook_mio-e98d1a7ffed565b4.rlib" "/candle-vllm/target/release/deps/libsignal_hook-2e1b132684e7f9b1.rlib" "/candle-vllm/target/release/deps/libanyhow-82509780fe55d5b3.rlib" "/candle-vllm/target/release/deps/libcandle_lora-989120b3272dc83a.rlib" "/candle-vllm/target/release/deps/libtrc-af4d2dc9e955d45c.rlib" "/candle-vllm/target/release/deps/libuuid-4eaa2543f30057bf.rlib" "/candle-vllm/target/release/deps/libcandle_transformers-b19346912b1b1838.rlib" "/candle-vllm/target/release/deps/libserde_plain-1da78c8ca7ee3ec7.rlib" "/candle-vllm/target/release/deps/libcandle_flash_attn-2cb3752fec37905c.rlib" "/candle-vllm/target/release/deps/libdyn_fmt-ca01837b2f65b0b1.rlib" "/candle-vllm/target/release/deps/libfutures-1f136c83b52e06cd.rlib" "/candle-vllm/target/release/deps/libfutures_executor-a3d9383a156f9f10.rlib" "/candle-vllm/target/release/deps/libcandle_sampling-15eeca2267c8d9d8.rlib" "/candle-vllm/target/release/deps/libcandle_nn-31afb7102c013c81.rlib" "/candle-vllm/target/release/deps/libtokenizers-e4ef767b52132dd9.rlib" "/candle-vllm/target/release/deps/libesaxx_rs-734a5148ef5579eb.rlib" "/candle-vllm/target/release/deps/libunicode_normalization_alignments-db25bac7de079d3d.rlib" "/candle-vllm/target/release/deps/libspm_precompiled-987f4e1f2e6e5cbb.rlib" "/candle-vllm/target/release/deps/libbase64-a00060132962802d.rlib" "/candle-vllm/target/release/deps/libunicode_segmentation-0609f6ce0b27032d.rlib" "/candle-vllm/target/release/deps/libnom-3d1816c8c91e268a.rlib" "/candle-vllm/target/release/deps/libunicode_categories-4b2d8309eb580595.rlib" "/candle-vllm/target/release/deps/libmonostate-15263a542f4a6e1c.rlib" "/candle-vllm/target/release/deps/libmacro_rules_attribute-fbe2172e90fd6d9d.rlib" "/candle-vllm/target/release/deps/libindicatif-e51bcf09d533796f.rlib" "/candle-vllm/target/release/deps/libportable_atomic-37fa7d733d3c2283.rlib" "/candle-vllm/target/release/deps/libnumber_prefix-fcbd61cd7f0fb674.rlib" "/candle-vllm/target/release/deps/libconsole-d644b00c632b508a.rlib" "/candle-vllm/target/release/deps/libunicode_width-4a01194dbfae8c91.rlib" "/candle-vllm/target/release/deps/librayon_cond-84d2d6a9f989ac86.rlib" "/candle-vllm/target/release/deps/libitertools-87b264833edf6f52.rlib" "/candle-vllm/target/release/deps/libonig-9d7270d7c8fd85bc.rlib" "/candle-vllm/target/release/deps/libonig_sys-b2ea9dea10b0294e.rlib" "/candle-vllm/target/release/deps/libderive_builder-7eba651d6dc3f2d2.rlib" "/candle-vllm/target/release/deps/liblazy_static-852800890c81fb22.rlib" "/candle-vllm/target/release/deps/libclap-6cce81f5b0c21c4d.rlib" "/candle-vllm/target/release/deps/libclap_builder-c0b7d4c619b20786.rlib" "/candle-vllm/target/release/deps/libstrsim-bfb3799e9677cd4d.rlib" "/candle-vllm/target/release/deps/libanstream-2345d25369a0c766.rlib" "/candle-vllm/target/release/deps/libanstyle_query-d08e7c102e46eb49.rlib" "/candle-vllm/target/release/deps/libcolorchoice-d9fe16d50a3dd803.rlib" "/candle-vllm/target/release/deps/libanstyle_parse-6ac7d6e179081361.rlib" "/candle-vllm/target/release/deps/libutf8parse-86e737e0d4678582.rlib" "/candle-vllm/target/release/deps/libclap_lex-3a6b7689365ae37a.rlib" "/candle-vllm/target/release/deps/libanstyle-eb2ffb42ebf589fd.rlib" "/candle-vllm/target/release/deps/libcandle_core-bd61d77eb1719017.rlib" "/candle-vllm/target/release/deps/libmemmap2-212b860c29c3b1bd.rlib" "/candle-vllm/target/release/deps/libzip-9f9cf8564fd57087.rlib" "/candle-vllm/target/release/deps/libyoke-0edeeb516196a696.rlib" "/candle-vllm/target/release/deps/libzerofrom-635514bc19b31e05.rlib" "/candle-vllm/target/release/deps/libstable_deref_trait-76725faa25d9c59b.rlib" "/candle-vllm/target/release/deps/libthiserror-a7014e8beba5c405.rlib" "/candle-vllm/target/release/deps/libsafetensors-1dc2485f251fe6a8.rlib" "/candle-vllm/target/release/deps/libcudarc-15ad263f438ac593.rlib" "/candle-vllm/target/release/deps/libcandle_kernels-858c2d0e13ad4d14.rlib" "/candle-vllm/target/release/deps/libgemm-0df17278b2df1e9d.rlib" "/candle-vllm/target/release/deps/libgemm_c32-1a5f8b19e06a8c97.rlib" "/candle-vllm/target/release/deps/libgemm_c64-76bce200eaecdee4.rlib" "/candle-vllm/target/release/deps/libgemm_f64-3224a33d6d916809.rlib" "/candle-vllm/target/release/deps/libgemm_f16-d938d2b4b3fed4e5.rlib" "/candle-vllm/target/release/deps/libgemm_f32-5c254676f792b840.rlib" "/candle-vllm/target/release/deps/libgemm_common-d38692569c7f4e1a.rlib" "/candle-vllm/target/release/deps/libpulp-1ce4cd9fe89db9ff.rlib" "/candle-vllm/target/release/deps/libnum_complex-3fed0e2e4f9e5202.rlib" "/candle-vllm/target/release/deps/libdyn_stack-9b24260c69b5272e.rlib" "/candle-vllm/target/release/deps/libreborrow-77659d577c4b718c.rlib" "/candle-vllm/target/release/deps/libraw_cpuid-b9cfe85e371d3083.rlib" "/candle-vllm/target/release/deps/libbitflags-b9815c55ec510696.rlib" "/candle-vllm/target/release/deps/librayon-3c309fded7dea17d.rlib" "/candle-vllm/target/release/deps/librayon_core-1c9c2cb057344777.rlib" "/candle-vllm/target/release/deps/libcrossbeam_deque-61f81dc6e7e011b4.rlib" "/candle-vllm/target/release/deps/libcrossbeam_epoch-5d4631034dbce19f.rlib" "/candle-vllm/target/release/deps/libcrossbeam_utils-3c947cc337c38520.rlib" "/candle-vllm/target/release/deps/libeither-c016b57e73ba30c1.rlib" "/candle-vllm/target/release/deps/libbyteorder-8bf78fc69cf5b0a1.rlib" "/candle-vllm/target/release/deps/libhalf-b518e6f9c7338ab7.rlib" "/candle-vllm/target/release/deps/librand_distr-cccb1699d7cd40ff.rlib" "/candle-vllm/target/release/deps/libnum_traits-28ee9b33f1e53f29.rlib" "/candle-vllm/target/release/deps/libbytemuck-61cb00a9722bf6f9.rlib" "/candle-vllm/target/release/deps/libactix_web-d5c2e016c0a76be2.rlib" "/candle-vllm/target/release/deps/liburl-2815a4daf7f9adfd.rlib" "/candle-vllm/target/release/deps/libidna-c440fe7285f1b3e7.rlib" "/candle-vllm/target/release/deps/libunicode_normalization-8459a56260cd69f0.rlib" "/candle-vllm/target/release/deps/libtinyvec-61debd23e06e16bf.rlib" "/candle-vllm/target/release/deps/libtinyvec_macros-f326b6a6f0ca8a7b.rlib" "/candle-vllm/target/release/deps/libunicode_bidi-f32ebb17f2e11b02.rlib" "/candle-vllm/target/release/deps/libserde_urlencoded-487feade6ed66a8a.rlib" "/candle-vllm/target/release/deps/libform_urlencoded-3e169fc285508f2a.rlib" "/candle-vllm/target/release/deps/libserde_json-af7b0a1679d453f4.rlib" "/candle-vllm/target/release/deps/libryu-8b05c69dcf279a6f.rlib" "/candle-vllm/target/release/deps/libactix_server-094bd922ee0ce4c2.rlib" "/candle-vllm/target/release/deps/libactix_router-328861445c4e5b10.rlib" "/candle-vllm/target/release/deps/libregex-e23744b59681cf08.rlib" "/candle-vllm/target/release/deps/libregex_automata-467c1d200a330751.rlib" "/candle-vllm/target/release/deps/libaho_corasick-2ffc2abbd0b517eb.rlib" "/candle-vllm/target/release/deps/libregex_syntax-3dd804a409b2c545.rlib" "/candle-vllm/target/release/deps/libserde-9901539636462a7e.rlib" "/candle-vllm/target/release/deps/libcookie-7a03bc95754d078c.rlib" "/candle-vllm/target/release/deps/libtime-00420198b7d8f3d9.rlib" "/candle-vllm/target/release/deps/libtime_core-531fb2a2b6009484.rlib" "/candle-vllm/target/release/deps/libnum_conv-27cab79cc649b5eb.rlib" "/candle-vllm/target/release/deps/libderanged-93327753f562d3b3.rlib" "/candle-vllm/target/release/deps/libpowerfmt-c4543fc1903272c6.rlib" "/candle-vllm/target/release/deps/libactix_http-49df50c41746d7cb.rlib" "/candle-vllm/target/release/deps/librand-9bab0c1e1cd6b8a5.rlib" "/candle-vllm/target/release/deps/librand_chacha-85c9c49588ab6f36.rlib" "/candle-vllm/target/release/deps/libppv_lite86-9a645f708eed4e1c.rlib" "/candle-vllm/target/release/deps/librand_core-9fc4ad8dc509a141.rlib" "/candle-vllm/target/release/deps/libhttparse-699e93ce2c2e7905.rlib" "/candle-vllm/target/release/deps/libbrotli-df4299509820f939.rlib" "/candle-vllm/target/release/deps/libbrotli_decompressor-0212e4cdb0da1245.rlib" "/candle-vllm/target/release/deps/liballoc_stdlib-fc777d5f3c59a235.rlib" "/candle-vllm/target/release/deps/liballoc_no_stdlib-f497a54db348ea9b.rlib" "/candle-vllm/target/release/deps/libhttpdate-5f8e81ac577420b0.rlib" "/candle-vllm/target/release/deps/libsha1-15062a787796e890.rlib" "/candle-vllm/target/release/deps/libcpufeatures-331cc3717db65aac.rlib" "/candle-vllm/target/release/deps/libdigest-5f088c274186e012.rlib" "/candle-vllm/target/release/deps/libblock_buffer-2ad0dde06bca4c37.rlib" "/candle-vllm/target/release/deps/libcrypto_common-30c46997c474a2db.rlib" "/candle-vllm/target/release/deps/libgeneric_array-95ff38f8e6dc2014.rlib" "/candle-vllm/target/release/deps/libtypenum-ddf8574aa94ffabe.rlib" "/candle-vllm/target/release/deps/libbase64-ecd99bd23d0ff318.rlib" "/candle-vllm/target/release/deps/liblocal_channel-d8bac00ef5bd5826.rlib" "/candle-vllm/target/release/deps/libbytestring-4d1e0f611bab987e.rlib" "/candle-vllm/target/release/deps/libencoding_rs-c048082deb3a71c3.rlib" "/candle-vllm/target/release/deps/liblanguage_tags-e0dfc52f86f9b27a.rlib" "/candle-vllm/target/release/deps/libahash-a21dc39883e7ac23.rlib" "/candle-vllm/target/release/deps/libgetrandom-285e23a7efc98d13.rlib" "/candle-vllm/target/release/deps/libzerocopy-81a7c0f066e0c7c2.rlib" "/candle-vllm/target/release/deps/libmime-04e6f00618993e67.rlib" "/candle-vllm/target/release/deps/libpercent_encoding-d54414372a2980de.rlib" "/candle-vllm/target/release/deps/libh2-7f6e7d82bfd1eac1.rlib" "/candle-vllm/target/release/deps/libindexmap-3cfe35f40e644070.rlib" "/candle-vllm/target/release/deps/libequivalent-8a25e166243cfe94.rlib" "/candle-vllm/target/release/deps/libhashbrown-aee95c0614bccf63.rlib" "/candle-vllm/target/release/deps/libfutures_util-b7a602635d036bf0.rlib" "/candle-vllm/target/release/deps/libfutures_io-bdf6e194ea9577ee.rlib" "/candle-vllm/target/release/deps/libslab-490ef311b9a84e0e.rlib" "/candle-vllm/target/release/deps/libfutures_channel-5ea32066cac56ffd.rlib" "/candle-vllm/target/release/deps/libfutures_task-a2c77643b6b905dd.rlib" "/candle-vllm/target/release/deps/libpin_utils-185c55cbe9ca2fff.rlib" "/candle-vllm/target/release/deps/libzstd-36f5f9d9d2e3508b.rlib" "/candle-vllm/target/release/deps/libzstd_safe-aaa592be23ed88f7.rlib" "/candle-vllm/target/release/deps/libzstd_sys-4496f40e68c091c4.rlib" "/candle-vllm/target/release/deps/libflate2-5502f6fe44f51b4c.rlib" "/candle-vllm/target/release/deps/libminiz_oxide-de7ef7b63a4f7412.rlib" "/candle-vllm/target/release/deps/libsimd_adler32-d1dbd8e6b06bf162.rlib" "/candle-vllm/target/release/deps/libcrc32fast-ceb628e76fc0bab0.rlib" "/candle-vllm/target/release/deps/libactix_service-70cd0075f0fbfc66.rlib" "/candle-vllm/target/release/deps/libactix_codec-9e65a5e0b8e81eb5.rlib" "/candle-vllm/target/release/deps/libmemchr-842ac33dededf7d9.rlib" "/candle-vllm/target/release/deps/libbitflags-8cf9aca8dca9dec7.rlib" "/candle-vllm/target/release/deps/libtokio_util-b89e8c0cc220b5ed.rlib" "/candle-vllm/target/release/deps/libtracing-0c92736ef86fff4c.rlib" "/candle-vllm/target/release/deps/libtracing_core-8d095aff5d6b2dc5.rlib" "/candle-vllm/target/release/deps/libonce_cell-06dfa01c968b8e7e.rlib" "/candle-vllm/target/release/deps/libfutures_sink-8d94a6b44313bd03.rlib" "/candle-vllm/target/release/deps/libactix_utils-ec862be5af373362.rlib" "/candle-vllm/target/release/deps/liblocal_waker-7857496d2dec9a57.rlib" "/candle-vllm/target/release/deps/libactix_rt-7d0004af1d35aa47.rlib" "/candle-vllm/target/release/deps/libtokio-e1ade8da5b909c98.rlib" "/candle-vllm/target/release/deps/libsignal_hook_registry-dcf70e2b6c44755c.rlib" "/candle-vllm/target/release/deps/libnum_cpus-ed5feb6abe3397ba.rlib" "/candle-vllm/target/release/deps/libsocket2-1ff22b428921d589.rlib" "/candle-vllm/target/release/deps/libmio-afa56dff974d55e1.rlib" "/candle-vllm/target/release/deps/liblog-35f97248cb2ec82c.rlib" "/candle-vllm/target/release/deps/libparking_lot-1fe2bfc24acd589d.rlib" "/candle-vllm/target/release/deps/libparking_lot_core-563decbd4c1f2891.rlib" "/candle-vllm/target/release/deps/liblibc-b4a93e966581df64.rlib" "/candle-vllm/target/release/deps/libcfg_if-88c619515d65e3f1.rlib" "/candle-vllm/target/release/deps/libsmallvec-65a0bed430993cf2.rlib" "/candle-vllm/target/release/deps/liblock_api-920512de5989abb2.rlib" "/candle-vllm/target/release/deps/libscopeguard-6208b4062bcdc2b1.rlib" "/candle-vllm/target/release/deps/libpin_project_lite-42a553ee08f02ebb.rlib" "/candle-vllm/target/release/deps/libfutures_core-891f46f0aceca63c.rlib" "/candle-vllm/target/release/deps/libhttp-b738399ec4ab1c60.rlib" "/candle-vllm/target/release/deps/libitoa-dcbca83b54db3306.rlib" "/candle-vllm/target/release/deps/libbytes-8c2bf1b211f72910.rlib" "/candle-vllm/target/release/deps/libfnv-ffe196e20ea2a648.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-9c342d6596ca77d8.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-35e6faa0abf08dd1.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-6242b5524a2684de.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libmemchr-94511439d510df36.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-1923a594ddedab24.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-5b476927cd520d76.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-6b4664d28b4dc07b.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-4d7e14ee42b44abc.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-94e04d08d317eb2b.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-7e3a1db27b23a8ee.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-0651af3c34a1e4b9.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-e5da8ecb95d2de36.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-052b86aa844a2857.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-bbd2a157557b773d.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-f47279717d0e1831.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-d30e243a979711ec.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-18929aabe36e3f57.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-f9f41fbdedfbfafb.rlib" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-b26982894e484f03.rlib" "-Wl,-Bdynamic" "-lssl" "-lcrypto" "-lflashattention" "-lcudart" "-lstdc++" "-lstdc++" "-lcuda" "-lnvrtc" "-lcurand" "-lcublas" "-lcublasLt" "-lcudnn" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/usr/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/candle-vllm/target/release/deps/candle_vllm-001dc109ba8da34d" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs"
= note: /usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_api.o): relocation R_X86_64_32 against `.nvFatBinSegment' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim128_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi128ELi128ELi32ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi128ELi128ELi32ELi4ES2_EELb1ELb0ELb0ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim160_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi160ELi128ELi32ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi160ELi128ELi32ELi4ES2_EELb1ELb0ELb0ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim192_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi192ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi192ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim224_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi224ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi224ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim256_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim32_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi32ELi128ELi128ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi32ELi128ELi128ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim64_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi64ELi128ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi64ELi128ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim96_fp16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi96ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi96ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim128_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi128ELi128ELi32ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi128ELi128ELi32ELi4ES2_EELb1ELb0ELb0ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim160_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi160ELi128ELi32ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi160ELi128ELi32ELi4ES2_EELb1ELb0ELb0ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim192_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi192ELi64ELi64ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi192ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim224_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi224ELi128ELi64ELi8ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi224ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim256_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim32_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi32ELi128ELi128ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi32ELi128ELi128ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim64_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi64ELi128ELi64ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi64ELi128ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: /candle-vllm/target/release/build/candle-flash-attn-9fc2dbcb21177bee/out/libflashattention.a(flash_fwd_hdim96_bf16_sm80.o): relocation R_X86_64_32 against symbol `_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi96ELi64ELi64ELi4ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi96ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' can not be used when making a PIE object; recompile with -fPIE
collect2: error: ld returned 1 exit status
error: could not compile `candle-vllm` (bin "candle-vllm") due to previous error
error: failed to compile `candle-vllm v0.1.0 (/candle-vllm)`, intermediate artifacts can be found at `/candle-vllm/target`
I will restart from scratch but this time I will download the latest version of Rust instead of using the Rocky 9 one, to see if it succeeds that way.
Using the latest Rust failed too at the linking phase :-/.
I was able to build successfully on a fresh instance, by executing:
git clone https://github.com/EricLBuehler/candle-vllm.git
cd candle-vllm
sudo apt install libssl-dev
sudo apt install pkg-config
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
cargo build
Of course, execution will not work, as the kernels for Paged Attention are in the process of porting. However, cargo build
works. Does this help?
Yep it's good to know that in some cases it compiles. I am trying currently with nvidia/cuda:12.2.2-cudnn8-devel-rockylinux9 as base instead of installing libcudnn8-devel since googling around it seems that for static linking the versions of CUDA with which CUDNN has been compiled must match, and Rust maybe it's compiling statically. If this doesn't work then I will switch to an Ubuntu image and retry. Thanks for all your help!
Ouch, that didn't work, I will try with Ubuntu tomorrow.
Success!!! With nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04 as base it compiles correctly. So the linking error seems to be with Red Hat based distros. It may or may not happen with Ubuntu 24.04 when that gets released, will be interesting to test it at that time. Maybe we should leave this issue opened and see if it gets resolved in the future? At this time I guess that it's more important to continue adding features and polishing it instead of investing time on obscure linking problems.
I agree. It may have to do with the CUDA driver, though.
Is this issue resolved.. ?? looks like i am hitting at same issue.. on ubuntu 23.10 rrors detected in the compilation of "kernels/flash_fwd_hdim256_bf16_sm80.cu".
thread '
From what I understand, this issue is not resolved, as it is likely part of Candle. Could you please open an issue on Candle? candle-vllm
does not build flash attention kernels, and this build step is a part of Candle's build.rs.
Hi! Sorry for the late reply. Tested it and you are right, reported here: https://github.com/huggingface/candle/issues/1844 Thanks!!!
Thank you! Please see mistral.rs it is the successor to this project which supports flash attention and GGUF, etc.
On Wed, Mar 13, 2024, 3:13 PM Iván Baldo @.***> wrote:
Hi! Sorry for the late reply. Tested it and you are right, reported here: huggingface/candle#1844 https://github.com/huggingface/candle/issues/1844 Thanks!!!
— Reply to this email directly, view it on GitHub https://github.com/EricLBuehler/candle-vllm/issues/25#issuecomment-1995446875, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRFUWZT6NQAS2HTMGULY3LYYCQM3AVCNFSM6AAAAABCT3LJSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVGQ2DMOBXGU . You are receiving this because you were assigned.Message ID: @.***>
I am trying to make the following (unfinished) Dockerfile work:
But it fails with:
Maybe I shouldn't use the flash-attn feature? Thanks for any suggestions or information.