bazelbuild / rules_rust

Rust rules for Bazel
https://bazelbuild.github.io/rules_rust/
Apache License 2.0
654 stars 413 forks source link

Recommendation for allocators when linking with C* binary #1238

Open keith opened 2 years ago

keith commented 2 years ago

When using rules_rust to link a rust library into a final C / C++ binary, you end up hitting some linker issues around missing allocator symbols:

Undefined symbols for architecture arm64:
  "___rust_alloc", referenced from:
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h2255888a02196d18 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h8c8e34d5191bcf2c in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      alloc::raw_vec::finish_grow::hf9c3a7b5d277e764 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::Context$LT$R$GT$::find_frames::h2faede176ceedb5a in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResDwarf$LT$R$GT$::parse::hde4e81cf26db861b in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResUnit$LT$R$GT$::parse_lines::h989d285cff418912 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResUnit$LT$R$GT$::render_file::h057f5d698fb8b6fa in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      ...
  "___rust_alloc_error_handler", referenced from:
      alloc::alloc::handle_alloc_error::rt_error::h0566a1d251bd0a5b in liballoc-7e50779556d46264.a(alloc-7e50779556d46264.alloc.6069a24a-cgu.0.rcgu.o)
  "___rust_alloc_zeroed", referenced from:
      _$LT$std..sys..unix..fs..File$u20$as$u20$core..fmt..Debug$GT$::fmt::h858fd589f450b4fa in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
  "___rust_dealloc", referenced from:
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h5915480aa2f4d427 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h8c8e34d5191bcf2c in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::he32668d17116d3cf in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$core..option..Option$LT$std..sys..unix..process..process_common..CStringArray$GT$$GT$::hc73bf8e1be5ce67e in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$alloc..collections..btree..map..BTreeMap$LT$u64$C$gimli..read..abbrev..Abbreviation$GT$$GT$::hd615d11dfc761993 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$$LP$std..ffi..os_str..OsString$C$core..option..Option$LT$std..ffi..os_str..OsString$GT$$RP$$GT$::h53e782e9cd3898db in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      core::ptr::drop_in_place$LT$$LT$std..backtrace..Backtrace$u20$as$u20$core..fmt..Display$GT$..fmt..$u7b$$u7b$closure$u7d$$u7d$$GT$::h11130cb13523a05c in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      ...
  "___rust_realloc", referenced from:
      alloc::raw_vec::finish_grow::hf9c3a7b5d277e764 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::Context$LT$R$GT$::find_frames::h2faede176ceedb5a in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      addr2line::ResUnit$LT$R$GT$::parse_lines::h989d285cff418912 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::env::current_dir::h829333952c66e8ad in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::ffi::c_str::CString::from_vec_unchecked::hc7adf6773960680b in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::ffi::c_str::CString::from_vec_with_nul_unchecked::h6cd7e60a1e34c996 in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      std::ffi::c_str::CString::from_vec_with_nul::he767a74f4ba7920f in libstd-eb660d415c354e23.a(std-eb660d415c354e23.std.dfe53c44-cgu.0.rcgu.o)
      ...

After reading some various issues it sounds like the core issue is that rust mostly expects to handle linking itself, and provides these manually during the final link, but when linking it into a C binary with bazel, we don't go through that logic and therefore end up missing these symbols.

This change https://github.com/bazelbuild/rules_rust/commit/802bcf937dbdffc267d65f4a85cc1c917ac49f63 potentially improves this by allowing you to add a global allocator_library which can provide these symbols and solve this. But as far as I can tell that would currently require reproducing a lot of toolchain setup boilerplate, as well as a library containing the necessary symbols (one of which can be found here https://github.com/chromium/chromium/blob/42bd3d0802e90ca093450a2683ff2a104e10e48a/build/rust/std/remap_alloc.cc)

I'm hoping for some input on the best way to solve this, ideally for everyone, and ideally in rules_rust. One potential thing I could see working is defaulting the allocator library to a library in rules_rust containing something like chromium's solution. If that would be too invasive we could potentially expose the allocator_library attribute and pipe that through all the toolchain setup so users could specify one globally in their WORKSPACE without having to add too much boilerplate.

Please let me know what you think!

sayrer commented 2 years ago

Does the same program link on x86_64 (Mac or Linux)? It could just be a bogus platform name, we've hit that before: https://github.com/bazelbuild/rules_rust/pull/806/files

keith commented 2 years ago

Ah sorry I meant to attach my repro case too rustrepro.zip, this + bazel build main is all you need to get the above failure. This also fails on my x86_64 ubuntu machine

keith commented 2 years ago

So also interestingly, this is avoided if you use rust_static_library instead of rust_library within the same build, which isn't the recommended usage from the docs where it says it should only be used when vendoring that final library. But in my test case the original usage with an older version of rules_rust was using create_type = "staticlib", which I guess would be equivalent to still using rust_static_library?

keith commented 2 years ago

Thinking more about the rust_static_library workaround, maybe it would make sense to include a single rust_static_library in your build graph, to get the allocator infra, and then use everything else as a rust_library, similar to how you would inject a single cc_library in the build graph

UebelAndre commented 2 years ago

@hlopko wondering if you have any thoughts here.

hlopko commented 2 years ago

Hi folks,

You're right, https://github.com/bazelbuild/rules_rust/commit/802bcf937dbdffc267d65f4a85cc1c917ac49f63 was created so we have a way to provide those allocator shims to the linker somehow. Internally we use pretty much exactly what you found in chromium. rust_static_librtary/staticlib is tricky - you have to make sure you only have one staticlib in your transitive closure otherwise you have duplicated symbols error at link time.

Ideally we would get a supported solution upstream. https://github.com/rust-lang/rust/issues/73632 is the tracking bug. In the meantime we will probably continue with the workaround like chromium does.

FTR, we don't expect to support custom allocators in our Bazel setup, we will use the allocator configured for C++ through --custom_malloc.

keith commented 2 years ago

It looks like https://github.com/bazelbuild/rules_rust/pull/1298 removes the rust_static_library workaround for this case anyways as discussed in https://github.com/bazelbuild/rules_rust/issues/1063

Should we provide a default allocator in rules_rust for this case?

keith commented 2 years ago

I actually realized that even with #1298 CcInfo is still propagated, maybe that one should similarly be restricted? I noticed because we have a rust_static_library depended on by other cc_library targets and it still works just fine with that patch. cc @scentini 

scentini commented 2 years ago

I think cc_library -> rust_static_library is a valid dependency, and as such we want to emit CcInfo for rust_static_library. The rust_static_library workaround you mentioned above should still be valid. Is that not the case?

keith commented 2 years ago

It is still valid, but I'm not sure if we want it to be 😛?

scentini commented 2 years ago

Yes, there is a benefit to propagating a CcInfo from a rust_static_library in that one can depend on a rust_static_library from a cc_library without specifying additional linker arguments.

bsilver8192 commented 2 years ago

If it helps, I added a full working example in #1350. It takes a bit to set up, but I think it will work somewhat robustly this way. Rust toolchain changes may break it, but I don't think any recent ones would. A setup based on that PR works with all my toolchains, which include cross compiling with both GCC and Clang, and linking with llvm-ld and GNU ld.

keith commented 2 years ago

Nice, I also have a demo with mobile in https://github.com/keith/bazel-rust-mobile-demo/

bsilver8192 commented 2 years ago

I suspect your approach will fail when building a cc_test with more complex codebases keith. It's definitely simpler, and if the caveats are acceptable then it looks nice, but I'd like to lay out some things for you or others who try using it. Also reasons why this is not a good approach for rules_rust to adopt. If you use llvm-ld or another linker which ignores the traditional Unix linker search algorithm and just searches all objects, then none of this applies, it'll work great.

If I'm understanding it correctly, you only have a dependency from one rust_library to the allocator shim implementation, which means the shared objects built for each library in test mode (at least for cc_library, not sure what rust_library currently does) won't have those symbols, so they won't link. You could add that to the deps of each rust_library manually, but that gets pretty tedious, especially if you use external dependencies with rust_libraries you don't write (either cargo-raze or Bazel external repositories with pre-written BUILD files).

Also, if no rust_library with a dependency on alloc uses a certain function, but something else without that dependency does, then Bazel's arbitrary choice of link order determines whether those symbols get linked in or linking fails.

GregBowyer commented 2 years ago

I hit this without a custom allocator using GCC (on clang it works fine ala 802bcf9).

An alternative workaround to the one posted in #1350, is to do a stub of a rust binary that essentially calls into C++. I made the C++ have a int app_main(int argc, char** argv) and the following rust src

use std::ffi::{CString};
use std::os::raw::{c_char, c_int};

extern "C" {
    fn app_main(num_args: c_int, args: *const *const c_char) -> c_int;
}

fn main() {
    let args = std::env::args_os()
        .map(|arg| arg.to_string_lossy().to_string())
        .map(|arg| arg.into_bytes())
        .map(|arg| unsafe { CString::from_vec_unchecked(arg) })
        .collect::<Vec<_>>();

    let rc = unsafe { app_main(args.len() as c_int, args.as_ptr() as *const *const c_char) };

    std::process::exit(rc);
}

This is untested on anything other than linux and is IMHO a hack, but might be useful for people who need to tweak a binary here or there.

Of interest, this seems to only happen to me for some link targets of the form rust_library -> cc_library -> cc_library -> cc_binary I have found others link just fine rust_library -> cc_library -> cc_binary

GregBowyer commented 2 years ago

Alas this failed for me on macos, I think recommending rust_library for CcInfo provision is a little flawed, even if the older approach has its issues.

bsilver8192 commented 2 years ago

Update on this: #1490 added the rust_toolchain.allocator_library attribute which allows setting a cc_library that implements these symbols. Actually creating that library is still tricky though. You can get started by copying the one that PR includes, but it may need updates to work with different rustc versions.

bsilver8192 commented 1 year ago

For anybody looking to use a non-default global allocator with allocator_library, here's how I do it:

//! Hand-written allocator shims to use `my_allocator` with Bazel. This is used instead of
//! `#[global_allocator]`.

use std::alloc::{GlobalAlloc, Layout};

use my_allocator::MyAllocator;

static ALLOC: MyAllocator = MyAllocator;

#[no_mangle]
extern "C" fn __rust_alloc(size: usize, align: usize) -> *mut u8 {
    unsafe { ALLOC.alloc(Layout::from_size_align_unchecked(size, align)) }
}

#[no_mangle]
extern "C" fn __rust_dealloc(ptr: *mut u8, size: usize, align: usize) {
    unsafe { ALLOC.dealloc(ptr, Layout::from_size_align_unchecked(size, align)) }
}

#[no_mangle]
extern "C" fn __rust_realloc(ptr: *mut u8, old_size: usize, align: usize, new_size: usize) -> *mut u8 {
    unsafe { ALLOC.realloc(ptr, Layout::from_size_align_unchecked(old_size, align), new_size) }
}

#[no_mangle]
extern "C" fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8 {
    unsafe { ALLOC.alloc_zeroed(Layout::from_size_align_unchecked(size, align)) }
}