TheDan64 / inkwell

It's a New Kind of Wrapper for Exposing LLVM (Safely)
https://thedan64.github.io/inkwell/
Apache License 2.0
2.31k stars 222 forks source link

Undefined behavior with `MemoryBuffer::create_from_memory_range` #185

Open novacrazy opened 4 years ago

novacrazy commented 4 years ago

Describe the Bug:

This code will result in a panic from LLVM saying expected top-level entity from the LLVM IR.

pub static PRELUDE_LLVM_IR: &[u8] = include_bytes!("prelude.llvm");

#[inline(never)]
pub fn build_prelude<'ctx>(ctx: &JitContext<'ctx>) -> Module<'ctx> {
    let module = ctx
        .create_module_from_ir(MemoryBuffer::create_from_memory_range(
            PRELUDE_LLVM_IR, 
            "prelude"
        ))
        .unwrap();

    module
}

However, if I were to mark build_prelude as #[inline(always)], suddenly it doesn't panic. JitContext is just a small wrapper around inkwell::context::Context.

Solution

Use MemoryBuffer::create_from_memory_range_copy. Seems create_from_memory_range leaking memory causes it to become corrupted or something funky, but if it's inline then the value is kept on the stack long enough for LLVM to parse it. I don't know. I've also only just discovered this, so it's possible create_from_memory_range_copy may also have issues, but so far it at least fixes that odd behavior.

Details

I build debug with opt-level=3, so perhaps that has something to do with it.

If you're curious, here is the current prelude.llvm. It's just some intrinsic stuff with fast call.

```llvm ; ModuleID = 'prelude' source_filename = "prelude" declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>) #0 ; Function Attrs: nounwind inlinehint define <4 x float> @raygon.fma.v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2) #1 { entry: %res = call fast <4 x float> @llvm.fma.v4f32(<4 x float> %0, <4 x float> %1, <4 x float> %2) ret <4 x float> %res } attributes #0 = { nounwind readnone speculatable willreturn } attributes #1 = { nounwind inlinehint } ```
TheDan64 commented 4 years ago

Good find, thanks!

Kixiron commented 4 years ago

Hello, the problem here is that this function is incredibly unsound. A MemoryBuffer provides "read-only access to a block of memory" (docs), which means that when one is made with create_from_memory_range it still points into the original source &[u8]. This means that if and when that original &[u8] is dropped, the underlying memory is invalidated and causes UB. This can also cause UB because the Rust owner of the byte slice can freely mutate the slice after they've created the MemoryBuffer while LLVM could be reading it, causing more UB. Additionally, a MemoryBuffer must be null terminated as stated in the description of MemoryBuffer, and inkwell currently does not check for this. The docs of create_from_memory_range also say that the function is "leaking memory" when it isn't, it's just allowing LLVM to look at a slice of Rust-owned memory in a read-only fashion.

seanyoung commented 3 years ago

@Kixiron in the example above the &[u8] is 'static, so this does not hold any water.

Kixiron commented 3 years ago

MemoryBuffer does not have any lifetime constraints and the slice it receives can have an arbitrary lifetime, which means you can trivially cause use-after-free scenarios

seanyoung commented 3 years ago

That is true but how is that relevant for this bug?

jlegare commented 3 days ago

As an additional data point, the following snippet will work fine

        let source = "target triple = \"arm64-apple-macosx14.0.0\" \
define void @some_function() #0 { ret void }\n";
        let memory_buffer = MemoryBuffer::create_from_memory_range_copy(source.as_bytes(), "source");
        context.create_module_from_ir(memory_buffer).map_err(|message| message.to_string()).unwrap()

whereas this modified version will panic on the unwrap

        let source = "target triple = \"arm64-apple-macosx14.0.0\" \
define void @some_function() #0 { ret void }\n";
        let memory_buffer = MemoryBuffer::create_from_memory_range(source.as_bytes(), "source");
        context.create_module_from_ir(memory_buffer).map_err(|message| message.to_string()).unwrap()

The difference between the two code fragments is the use of MemoryBuffer::create_from_memory_range_copy() in the first case, and MemoryBuffer::create_from_memory_range() in the second.