Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Apache License 2.0
3.15k stars 120 forks source link

Building repo, getting "undefined symbol: setupterm" #7

Open dbeckwith opened 2 years ago

dbeckwith commented 2 years ago

Bear with me here as I'm on NixOS so installing the dependencies has been a journey. I've cloned this repo and am just trying to run cargo build. I've gotten as far as installing CUDA and OptiX, to the point where it's actually building the path_tracer crate, but now I'm getting some scary codegen errors from rustc:

error: failed to run custom build command for `path_tracer v0.1.0 ($REPO_ROOT/examples/cuda/cpu/path_tracer)`

Caused by:
  process didn't exit successfully: `$REPO_ROOT/target/debug/build/path_tracer-970d6a9b9c38170f/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=../../gpu/path_tracer_gpu

  --- stderr
  warning: $REPO_ROOT/crates/cust/Cargo.toml: `default-features = [".."]` was found in [features]. Did you mean to use `default = [".."]`?
  error: failed to run `rustc` to learn about target-specific information

  Caused by:
    process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -Zcodegen-backend=$REPO_ROOT/target/debug/deps/librustc_codegen_nvvm.so -Cllvm-args=-arch=compute_61 --target nvptx64-nvidia-cuda --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit status: 1)
    --- stderr
    error: couldn't load codegen backend "$REPO_ROOT/target/debug/deps/librustc_codegen_nvvm.so": "$REPO_ROOT/target/debug/deps/librustc_codegen_nvvm.so: undefined symbol: setupterm"

  thread 'main' panicked at 'Did not find output file in rustc output', crates/cuda_builder/src/lib.rs:444:10
  stack backtrace:
     0: rust_begin_unwind
               at /rustc/4e89811b46323f432544f9c4006e40d5e5d7663f/library/std/src/panicking.rs:517:5
     1: core::panicking::panic_fmt
               at /rustc/4e89811b46323f432544f9c4006e40d5e5d7663f/library/core/src/panicking.rs:100:14
     2: core::panicking::panic_display
               at /rustc/4e89811b46323f432544f9c4006e40d5e5d7663f/library/core/src/panicking.rs:64:5
     3: core::option::expect_failed
               at /rustc/4e89811b46323f432544f9c4006e40d5e5d7663f/library/core/src/option.rs:1637:5
     4: core::option::Option<T>::expect
               at /rustc/4e89811b46323f432544f9c4006e40d5e5d7663f/library/core/src/option.rs:708:21
     5: cuda_builder::get_last_artifact
               at $REPO_ROOT/crates/cuda_builder/src/lib.rs:432:16
     6: cuda_builder::invoke_rustc
               at $REPO_ROOT/crates/cuda_builder/src/lib.rs:417:20
     7: cuda_builder::CudaBuilder::build
               at $REPO_ROOT/crates/cuda_builder/src/lib.rs:238:20
     8: build_script_build::main
               at ./build.rs:4:5
     9: core::ops::function::FnOnce::call_once
               at /rustc/4e89811b46323f432544f9c4006e40d5e5d7663f/library/core/src/ops/function.rs:227:5
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
warning: build failed, waiting for other jobs to finish...
error: build failed

Any tips on what do here?

Versions:

RDambrosio016 commented 2 years ago

Ive been made aware of this issue before, but im not quite sure what the best way of solving it is. Apparently this sometimes happens with LLVM and we need to pass -ltinfo to get it to link in terminfo. However ive also seen reports of this causing other link failures... @anderslanglands what do you think? have you had this issue?

anderslanglands commented 2 years ago

Yes I had this at one stage. Cannot for the life of me remember how I fixed it. Will dig back through the code and see if I can remember. But yeah the long and the short of it is you need to link against libtinfo.

On Wed, 24 Nov 2021 at 11:04, RDambrosio016 @.***> wrote:

Ive been made aware of this issue before, but im not quite sure what the best way of solving it is. Apparently this sometimes happens with LLVM and you need to pass -ltinfo to get it to link in terminfo. However ive also seen reports of this causing other link failures... @anderslanglands https://github.com/anderslanglands what do you think? have you had this issue?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Rust-GPU/Rust-CUDA/issues/7#issuecomment-977209155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXP4V6HBPFV4D7K5W23UNQFYVANCNFSM5IUSDHKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

RDambrosio016 commented 2 years ago

shouldn't this be fixed by linking in system-libs? why does llvm not include that?

dbeckwith commented 2 years ago

Is libtinfo something I might need to install?

dbeckwith commented 2 years ago

I guess it's part of ncurses right? Should I try installing that? I'm on NixOS so I don't have many common libs installed system-wide by default.

anderslanglands commented 2 years ago

ncurses depends on it but I think it might be a separate package in a lot of distros (no idea about nix I’m afraid).

On Wed, 24 Nov 2021 at 13:42, Daniel Beckwith @.***> wrote:

I guess it's part of ncurses right? Should I try installing that? I'm on NixOS so I don't have many common libs installed system-wide by default.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Rust-GPU/Rust-CUDA/issues/7#issuecomment-977321623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXKESOPLORGOA7ZLTA3UNQYH3ANCNFSM5IUSDHKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

anderslanglands commented 2 years ago

Hmm this is what link_llvm_system_libs() does in rustc_codegen_nvvm/build.rs. It surprises me that you're getting a missing symbol rather than a library not found error. Could you check:

  1. What the output of llvm-config --system-libs gives you, and
  2. Check that link_llvm_system_libs() is actually being called in your build (stick a panic in or something)
dbeckwith commented 2 years ago

When I run llvm-config --system-libs I get no output at all. What's the expected output from that?

anderslanglands commented 2 years ago

It should tell you which system libs llvm is linked against. For instance for my build of llvm on ubuntu 18.04 is gives me:

REZ➞  llvm-config  --system-libs
-lz -lrt -ldl -ltinfo -lpthread -lm -lxml2
anderslanglands commented 2 years ago

How did you install llvm?

dbeckwith commented 2 years ago

I installed it from nixpkgs, NixOS's package management system. There's both a llvm and libllvm package, but they're giving me the same results.

dbeckwith commented 2 years ago

It's possible I've just not installed LLVM properly, I can look into that and try again once I have llvm-config --system-libs showing something.

dbeckwith commented 2 years ago

Can I ask how you installed LLVM such that it has those system-libs show up? What Debian package would you use?

anderslanglands commented 2 years ago

I built it from source.

On Thu, 25 Nov 2021 at 02:48, Daniel Beckwith @.***> wrote:

Can I ask how you installed LLVM such that it has those system-libs show up? What Debian package would you use?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Rust-GPU/Rust-CUDA/issues/7#issuecomment-977895253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXOAIP5256K5Q7JGW5TUNTUJNANCNFSM5IUSDHKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dbeckwith commented 2 years ago

I tried installing the llvm-7-dev APT package on Ubuntu 18.04 and it also gives no output for llvm-config --system-libs. Is there some other build-time flag needed to get that work?

Either way, it's starting to look like a standard install of LLVM doesn't report system libs this way. Maybe link_llvm_system_libs() should find a different way to discover these libs?

RDambrosio016 commented 2 years ago

Someone told me that LLVM only gives system libs if linking statically. I used a lot of rustc's build.rs logic so we inherited a bit of dynamic vs shared linking stuff. i think we should always link statically and use llvm-config --link-static --system-libs

anderslanglands commented 2 years ago

What happens if you set LLVM_LINK_SHARED=1 when building?

On Thu, 25 Nov 2021 at 09:56, Daniel Beckwith @.***> wrote:

I tried installing the llvm-7-dev APT package on Ubuntu 18.04 and it also gives no output for llvm-config --system-libs. Is there some other build-time flag needed to get that work?

Either way, it's starting to look like a standard install of LLVM doesn't report system libs this way. Maybe link_llvm_system_libs() should find a different way to discover these libs?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Rust-GPU/Rust-CUDA/issues/7#issuecomment-978213737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXPALXODTFHG6EI57CLUNVGO7ANCNFSM5IUSDHKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dbeckwith commented 2 years ago

llvm-config --link-static --system-libs gives me -lz -lrt -ldl -ltinfo -lpthread -lm -lxml2

RDambrosio016 commented 2 years ago

Then i think we should remove dynamic linking stuff and just always statically link, dynamically linking in the codegen doesnt make much sense

dbeckwith commented 2 years ago

I edited the codegen crate build script to use --link-static and now I'm getting a different error from cc that it can't find the cuda library. I noticed in the error output that the cc flags include -L $CUDA_PATH/lib64, but my libcuda.so is in $CUDA_PATH/lib/stubs. I saw this TODO, maybe that logic needs tweaking?

Stupremee commented 2 years ago

What happens if you set LLVM_LINK_SHARED=1 when building? On Thu, 25 Nov 2021 at 09:56, Daniel Beckwith @.***> wrote: I tried installing the llvm-7-dev APT package on Ubuntu 18.04 and it also gives no output for llvm-config --system-libs. Is there some other build-time flag needed to get that work? Either way, it's starting to look like a standard install of LLVM doesn't report system libs this way. Maybe link_llvm_system_libs() should find a different way to discover these libs? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYQXPALXODTFHG6EI57CLUNVGO7ANCNFSM5IUSDHKQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

I had the same issue and fixed it by setting this environment variable.

dbeckwith commented 2 years ago

Hmm that might not be right actually. After tweaking a couple other things (it couldn't find -lxml2 so I just manually set the list of libs in link_llvm_system_libs() for now), everything builds fine, but the final binary links to libcuda.so.1 which I don't actually have. My $CUDA_PATH/lib/stubs folder only has libcuda.so and not libcuda.so.1. I'm not sure why these libs are in a folder named "stubs". $CUDA_PATH/lib looks like a more normal libs folder with lots of symlinks, but doesn't contain a libcuda.so.

dbeckwith commented 2 years ago

LLVM_LINK_SHARED=1 does work for me as well. Now the only issue I have is not finding libcuda.

Stupremee commented 2 years ago

Would you mind sharing your NixOS shell configuration that you use for using Rust-CUDA? Even though it doesn't work yet, I would be interested anyway

dbeckwith commented 2 years ago

@Stupremee https://gist.github.com/dbeckwith/bc3baade147ebff905a72c434812053d

There's no OptiX package in nixpkgs so I had to include a bespoke package for it. Unfortunately I couldn't find any public download URLs for it so you have to make an NVidia account, sign up for the developer program, download it manually, and add it to the Nix store (instructions are in the derivation). Also, fair warning that the CUDA package is a 3.8 GB download so it'll hang for a while.

RDambrosio016 commented 2 years ago

0.2 should work, i switched to cuda-sys' linux handling logic which should be more robust, and it should not fail to find cuda now

dbeckwith commented 2 years ago

Thanks for the update, but I'm still seeing the following issues:

  1. The original error in this thread still occurs. I can get around it by patching detect_llvm_link to always return ("dylib", "--link-shared") so it at least gets past the build step.
  2. The built binary is still linking to libcuda.so.1 which doesn't exist in my installation. I only have $CUDA_PATH/lib64/stubs/libcuda.so. As far as I can tell the Nix package installer doesn't do anything that might remove libcuda.so.1. Does this file exist in the standard Linux installation? Is there a way to change the linker to link to just libcuda.so? It's possible this is only an issue when detect_llvm_link returns dylib, so maybe fixing 1. will fix this as well?
dbeckwith commented 2 years ago

Sorry I forgot that 1. can be fixed by setting LLVM_LINK_SHARED=1 while building, although I'm not sure that's a permanent solution, but 2. is still an issue.

1617176084 commented 2 years ago

Sorry I forgot that 1. can be fixed by setting LLVM_LINK_SHARED=1 while building, although I'm not sure that's a permanent solution, but 2. is still an issue.

--link-shared and LLVM_LINK_SHARED=1 ,Where should I input these two? Could you tell me the steps?

dbeckwith commented 2 years ago

--link-shared and LLVM_LINK_SHARED=1 ,Where should I input these two? Could you tell me the steps?

It's an environment variable, so for example:

$ LLVM_LINK_SHARED=1 cargo build

or:

$ export LLVM_LINK_SHARED=1
$ cargo run --bin path_tracer

Setting this environment variable will force the build script to use --link-shared in the LLVM args: https://github.com/Rust-GPU/Rust-CUDA/blob/555c53123a8f9b78244ad1d9fb96ac8b1eca859f/crates/rustc_codegen_nvvm/build.rs#L122-L130

nottug commented 2 years ago

I'm running on arch using AUR package llvm70 (which installs to /opt/llvm70), with LLVM_LINK_SHARED=1 and getting

error: couldn't load codegen backend "/PATH/TO/APPLICATION/target/debug/librustc_codegen_nvvm.so": "libLLVM-7.so: cannot open shared object file: No such file or directory"

I had to run this to resolve it (rust-lang/rust #53813).

ln -s /opt/llvm70/lib/libLLVM-7.so $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib/

Could be that my llvm isn't linked properly, but my LLVM_CONFIG is exported as /opt/llvm70/bin/llvm-config.

Versions:

RDambrosio016 commented 2 years ago

I think that this is all caused by trying to link llvm dynamically, we should probably always link statically in 0.3

gzz2000 commented 2 years ago

I also can confirm that LLVM_LINK_SHARED=1 works.

Netherdrake commented 2 years ago

+1 LLVM_LINK_SHARED=1 works on Ubuntu 20.04 with llvm-7 installed via apt.