avr-rust / rust-legacy-fork

[deprecated; merged upstream] A fork of the Rust programming language with AVR support
Other
493 stars 14 forks source link

Can't compile when using a workspace #117

Closed Rahix closed 5 years ago

Rahix commented 5 years ago

After upgrading to the new compiler, I can no longer build my project that is part of a cargo workspace. Building outside of a workspace works. I get the following error:

rustc: /home/rahix/.../rust/src/llvm/lib/IR/Constants.cpp:1512: static llvm::Constant* llvm::ConstantExpr::getCast(unsigned int, llvm::Constant*, llvm::Type*, bool): Assertion `CastInst::castIsValid(opc, C, Ty) && "Invalid constantexpr cast!"' failed.

I get this error even with a completely stripped down file like:

#![no_std]
#![no_main]
#![feature(lang_items, panic_handler)]

#[no_mangle]
pub extern fn main() {
    loop { }
}

pub mod std {
    #[lang = "eh_personality"]
    pub unsafe extern "C" fn rust_eh_personality(
        _state: (),
        _exception_object: *mut (),
        _context: *mut (),
    ) -> () {
    }

    #[panic_handler]
    fn panic(_info: &::core::panic::PanicInfo) -> ! {
        loop { }
    }
}

The only other observation I was able to make is that is does not happen when I remove lto = true from the root Cargo.toml. For reference:

[package]
name = "..."
version = "0.0.0"
authors = ["..."]

[workspace]
members = ["...", "example"]

[profile]

[profile.release]
codegen-units = 1
debug = false
lto = true  # Commenting out this line allows building
opt-level = "s"
TimNN commented 5 years ago

Assertion `CastInst::castIsValid(opc, C, Ty) && "Invalid constantexpr cast!"' failed

Ugh, I hit that assertion myself, but for me compiling with codegen-units = 1 fixed the issue. IIRC this issue also happens when compiling with --emit=llvm-ir which makes it more annoying to debug.

My guess (and it really is only a wild guess) would be that this could be something address space related. I'll try to reproduce this again tomorrow and get at least the types involved, which should hopefully help understanding what is going on.

shepmaster commented 5 years ago

FWIW, it seems incredibly unlikely that workspaces have anything to do with the root of the problem. LTO is more likely, but I've compiled with LTO and not experienced it.

One thing that I have run into before is arbitrary reordering of pieces of the standard library cause LLVM to process things in a specific order and then choke on that. This can make reproducing a problem a very tricky business.

dylanmckay commented 5 years ago

My guess (and it really is only a wild guess) would be that this could be something address space related.

This is my hunch too. After my initial patch that added support for functions in non-default address spaces (and the subsequent commit of an adaption of it by another LLVM developer), I've hit a few casting related problems in LLVM due to AVR's program memory address space now being set to 1 and that breaking various scattered (but easy to fix) assumptions within LLVM.

The usual culprit - PointerType::getUnqual, which gets the pointer type representing a pointer in the unqualified, default address space zero. Any latent code in LLVM that calls getUnqual which happen to sometimes execute on function pointers will likely bug out in AVR. I've fixed a bunch of these (and a few more that I haven't committed in clang).

dylanmckay commented 5 years ago

@Rahix

If you have a debugger handy, in gdb you could execute rustc under gdb. LLVM assertion errors automatically break out into the debugger console.

In GDB, type

(gdb) up # go up from the assertion handler functions to the buggy code
(gdb) up
(gdb) print Ty->dump()

This will print the textual IR representation of the type being casted. If it looks like a function pointer, or it contains the text addrspace(1), or both, then @TimNN's wild guess it probably right

Rahix commented 5 years ago

IIRC this issue also happens when compiling with --emit=llvm-ir which makes it more annoying to debug.

Yes, noticed that as well :(

FWIW, it seems incredibly unlikely that workspaces have anything to do with the root of the problem. LTO is more likely, but I've compiled with LTO and not experienced it.

I was also pretty irritated when I noticed this. The thing is, it worked when I recreated the project outside the workspace. I was then able to reliably reproduce it in the workspace project and fix it by removing the workspace ...

Maybe in some way related to the fact that you have to specify profiles in the workspace root instead of the crate's Cargo.toml?

If you have a debugger handy

I will try that and report back, thanks!

TimNN commented 5 years ago

(gdb) print Ty->dump()

Note that (from my experiences in the past) you may need an unoptimized LLVM for that to work (optimized + debug info is not enough since, IIRC, local variables may have been optimized or gdb may not find methods).

TimNN commented 5 years ago

In my case:

$ frame 4
$ p Ty->dump()
void (%"str::traits::{{impl}}::index::{{closure}}"*)*
$ p C->dump()
define internal fastcc void @"core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::Range<usize>>::index::{{closure}}"(...) unnamed_addr addrspace(1) #2 { ... }
$ p opc
llvm::Instruction::BitCast
$ frame 8
$ p I.dump()
call fastcc addrspace(0) void bitcast (void (%"type 0x7fffdc629830"*)* @"core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::Range<usize>>::index::{{closure}}" to void (%0*)*)(%"str::traits::{{impl}}::index::{{closure}}"* noalias nocapture nonnull dereferenceable(6) %4) #3, !noalias !38
$ bt
#4  llvm::ConstantExpr::getCast (oc=47, C=0x7fffdc78bcb8, Ty=0x7fffdc2c0e90, OnlyIfReduced=false) 
    at [...]/src/llvm/lib/IR/Constants.cpp:1512
#5  llvm::ConstantExpr::getWithOperands (this=0x7fffdc78d288, Ops=..., Ty=0x7fffdc2c0e90, OnlyIfReduced=false, SrcTy=0x0)
    at [...]/src/llvm/lib/IR/Constants.cpp:1206
#6  (anonymous namespace)::Mapper::mapValue (this=0x7fffdc003d40, V=0x7fffdc78d288) at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:475
#7  (anonymous namespace)::Mapper::remapInstruction (this=0x7fffdc003d40, I=0x7fffdc793730)
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:873
#8  (anonymous namespace)::Mapper::remapFunction (this=0x7fffdc003d40, F=...) 
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:955
#9  (anonymous namespace)::Mapper::flush (this=0x7fffdc003d40) at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:855
#10 (anonymous namespace)::FlushingMapper::~FlushingMapper (this=0x7fffe8cd9bb0, __in_chrg=<optimized out>)
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:1073
#11 llvm::ValueMapper::mapValue (this=0x7fffe8cda280, V=...) 
    at [...]/src/llvm/lib/Transforms/Utils/ValueMapper.cpp:1098
#12 (anonymous namespace)::IRLinker::run (this=0x7fffe8cd9eb0) 
    at [...]/src/llvm/lib/Linker/IRMover.cpp:1355
#13 llvm::IRMover::move(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, llvm::ArrayRef<llvm::GlobalValue*>, std::function<void (llvm::GlobalValue&, std::function<void (llvm::GlobalValue&)>)>, bool) (this=0x555555a9ab60, Src=std::unique_ptr<llvm::Module> containing 0x0, ValuesToLink=..., AddLazyFor=..., IsPerformingImport=false) 
    at [...]/src/llvm/lib/Linker/IRMover.cpp:1476
#14 (anonymous namespace)::ModuleLinker::run (this=0x7fffe8cda510) 
    at [...]/src/llvm/lib/Linker/LinkModules.cpp:557
#15 llvm::Linker::linkInModule(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, unsigned int, std::function<void (llvm::Module&, llvm::StringSet<llvm::MallocAllocator> const&)>) (this=0x555555a9ab60, Src=std::unique_ptr<llvm::Module> containing 0x0, Flags=0, InternalizeCallback=...)
    at [...]/src/llvm/lib/Linker/LinkModules.cpp:579
#16 LLVMRustLinkerAdd (L=0x555555a9ab60, BC=<optimized out>, Len=<optimized out>)
    at ../rustllvm/Linker.cpp:64
#17 rustc_codegen_llvm::back::lto::fat_lto::{{closure}} ()
TimNN commented 5 years ago

A few things to note:

TimNN commented 5 years ago

Minimized:

target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

define internal { i16, i16 } @will_be_tail_call([0 x i8]*) addrspace(1) {
start:
  store [0 x i8]* %0, [0 x i8]** undef, align 1
  ret { i16, i16 } undef
}

define void @main() addrspace(1) personality i32 (...) addrspace(1)* @rust_eh_personality {
start:
  invoke addrspace(1) { i16, i16 } @will_be_tail_call([0 x i8]* undef)
          to label %bb29 unwind label %cleanup

bb29:
  unreachable

cleanup:
  landingpad { i8*, i32 }
          cleanup
  resume { i8*, i32 } undef
}

declare i32 @rust_eh_personality(...) addrspace(1)

If you run this code through opt -O1, you'll notice that will_be_tail_call will get addrspace(0).

TimNN commented 5 years ago

The usual culprit - PointerType::getUnqual

I've been skimming the uses of getUnqual in LLVM and found some potentially interesting things (haven't had the time yet to try fixing them).

@dylanmckay: Do you have a link to an example fix patch?

dylanmckay commented 5 years ago

Great work.

One way to confirm whether or not fastcc logic is the cause; run llc -print-after-all, when will dump the in-memory program representation after every single LLVM pass. I usually find the name of the function in question and grep the llc -print-after-all logs for it. Usually you'll get ~40 matches lines, or so, then it's just a matter of scrolling up the logs to the previous " IR Dump after pass: " line

dylanmckay commented 5 years ago

@dylanmckay: Do you have a link to an example fix patch?

dylanmckay commented 5 years ago

I've been skimming the uses of getUnqual in LLVM and found some potentially interesting things (haven't had the time yet to try fixing them).

Yeah, I am a bit concerned that there's latent bugs hidden deep within LLVM caused by address space assumptions, but on the plus side, the LLVM IR verifier will always catch it and trigger an error and so it'll never silently break.

This is made a bit more likely due to AVR being an experimental backend and so only the experimental buildbots compile it. This means LLVM developers may still introduce uses of getUnqual and friends and break AVR without any feedback loop. Once the backend is no longer experimental, so long as our test suite is good enough, AVR would break the tests and would need to be fixed before the patch could be merged.

dylanmckay commented 5 years ago

N.B. The only thing stopping me from submitting an RFC to remove the experimental status now is the lack of integration test suite. Once I've finished up the local branch I linked above, I should have integration tests working.

TimNN commented 5 years ago

One way to confirm whether or not fastcc logic is the cause; run llc -print-after-all,

Thanks for reminding me of this, I hadn't thought of -print-after-all. I'll try this this evening (CET).

TimNN commented 5 years ago

-print-after-all actually pointed me at DeadArgumentEliminationPass, and I think I fixed the issue there.

I'll see if there are other situation in which addrspace(0) is generated for libcore and post a patch afterwards.

Edit: One more in ArgumentPromotion. The culprit was Function::Create both times.

TimNN commented 5 years ago

So, I can compile libcore with -O3 now without having any addrspace(0) in there.

However, both issues I've found have two things in common:

(1) They happened due to Function::Create being used to transform a function. (2) The were located somewhere in lib/Transforms/IPO.

I then did a brief search for Function::Create in lib/Transforms/IPO. Upon a cursory glance, it looks like none of them (except the once I fixed) set the Address Space correctly. I'm gonna assume that they are all wrong since, IIUC, there is no way for executable code to be in any place other than the program data address space. I'll look at them in more details and post a patch.


I took an even shorter look at other uses of Function::Create and saw more suspicious ones in, e.g. /Linker/*. I'll take a look at those as well.


Btw, the addrspace(0) free libcore fixed my repro of the originally reported issue.

TimNN commented 5 years ago

Here is the commit fixing all the uses in lib/Transforms/IPO: https://github.com/TimNN/llvm/commit/fce776910a1fe4295f05386e34683536d82d3cb1

I didn't run any tests on this (yet), but it compiles and in general makes sense, as far as I can tell.

I fixed the calls to Function::Create by doing either of the following:

(1) Copying the address space from the function that was being replaced. (2) Using the address space from the DataLayout when new function were made up.

@dylanmckay: Can you commit this directly to LLVM or should I post this to reviews.llvm.org?

Edit: I added some (GitHub) comments to the commit about things I noticed / wasn't 100% sure about.

dylanmckay commented 5 years ago

I've realised that the verifier will not always catch instances of this; it may not come up if the addrspace(0) function is only used in one place, a call or invoke. When creating call or invoke IR instructions, I believe LLVM would have no problem because the address space of the destination will just be taken from the address space of the function, which will also be addrspace(0). I think the verifier will only catch bugs when the frontend generates IR that references an addrspace(1) function in something like a function pointer, like the C++ static constructor section and the array of function pointers included within. In that case, the verifier will notice that originally-valid frontend IR is now invalid because there's now a part of the IR that specifies the functions signature as addrspace(1), and a separate part introduced by LLVM that refers to the same function as addrspace(0). In this case, the verifier fails because the same function signature in two different address spaces is actually two completely separate, non-isomorphic types.

I wrote up LLVM bug PR39573 to track the creation of a purpose-built AVR-specific pass that ensures every single function is in addrspace(1).

dylanmckay commented 5 years ago

I didn't run any tests on this (yet), but it compiles and in general makes sense, as far as I can tell.

Extremely low chance of regression; there is not a single in-tree LLVM backend that sets program address space to 1 except for AVR, so that for all official backends, it is guaranteed to use the same default behaviour of 0.

@dylanmckay: Can you commit this directly to LLVM or should I post this to reviews.llvm.org?

Either or. reviews.llvm.org is good for code reviews, but you cannot commit via it. If posted to reviews.llvm.org and accepted by somebody else or me, then you'd still need to find somebody with push access to commit it.

It looks like you've got four LLVM commits to your name; you could definitely get push access to LLVM trunk if you wanted it. I like to encourage this because it means your work will come up under your own name (as opposed to a "Patch by" line) and attached to your GitHub so you can win social points if that's your kind of thing. That way it doesn't just look like I did everybody's work at a glance.

Usually after a few LLVM patches, people send an email to Chris and he gives it. I think I got commit access after about 2-3 patches. Completely up to you though, I have no problem committing and/or reviewing the patches

$ git log | grep "Tim Neu"
    Patch by Peter Nimmervoll (@brainlag) and Tim Neumann (@TimNN)
    Patch by Tim Neumann
    Patch by Tim Neumann!
    Patch by Tim Neumann.
TimNN commented 5 years ago

Either or. reviews.llvm.org is good for code reviews, but you cannot commit via it.

I know that :) I was mostly asking: Are you comfortable with committing this patch without another review from someone else.

you could definitely get push access to LLVM trunk if you wanted it.

I'm fine with you committing this patch for now, I'll think about getting commit access.

dylanmckay commented 5 years ago

I added a few tests and upstreamed in r349469.

dylanmckay commented 5 years ago

I guess that now we just need to update LLVM or cherry-pick the patch.

dylanmckay commented 5 years ago

Cherry-picked the fix in a3b0834328d73eae4f0a27afd7668b684e4a875f.