TheDan64 / inkwell

It's a New Kind of Wrapper for Exposing LLVM (Safely)
https://thedan64.github.io/inkwell/
Apache License 2.0
2.3k stars 220 forks source link

GEP is actually unsafe #33

Open Redrield opened 6 years ago

Redrield commented 6 years ago

In an attempt to get around #32, I've been trying to get a program working that can call printf with a string array. Looking at working IR, I see that to convert from an array to a pointer, an inbounds GEP instruction is used. This is the code I came up with to replicate that.

extern crate inkwell;
extern crate pest;

use inkwell::context::Context;
use inkwell::targets::{InitializationConfig, Target};
use inkwell::AddressSpace;
use inkwell::module::Linkage;
use inkwell::values::IntValue;

use std::path::Path;

fn main() {
    Target::initialize_native(&InitializationConfig::default()).unwrap();
    let ctx = Context::create();
    let module = ctx.create_module("program");
    let builder = ctx.create_builder();

    let str_type = ctx.i8_type().ptr_type(AddressSpace::Generic);
    let i32_type = ctx.i32_type();

    let my_message = "Hello, World!\n\0";
    let arr_type = ctx.i8_type().array_type(my_message.len() as u32);

    let mut chars = Vec::with_capacity(my_message.len());

    for ch in my_message.chars() {
        chars.push(ctx.i8_type().const_int(ch as u64, false));
    }

    let global = module.add_global(&arr_type, Some(AddressSpace::Generic), "message");
    let oof: Vec<&IntValue> = chars.iter().map(|x| x).collect();

    let str_array = arr_type.const_array(&oof[..]);

    global.set_initializer(&str_array);

    let printf_type = i32_type.fn_type(&[&str_type], true);
    let printf = module.add_function("printf", &printf_type, Some(&Linkage::ExternalLinkage));

    let main_fn_type = i32_type.fn_type(&[&i32_type, &str_type.ptr_type(AddressSpace::Generic)], false);

    let main_fn = module.add_function("main", &main_fn_type, None);
    let block = ctx.append_basic_block(&main_fn, "entry");
    builder.position_at_end(&block);

    let value = builder.build_in_bounds_gep(&global.as_pointer_value(), &oof[..], "");

    builder.build_call(&printf, &[&value], "", false);
    builder.build_return(Some(&i32_type.const_int(0, false)));

//    println!("{:?}", module.print_to_string());
    module.print_to_file(Path::new("test.ll"));
}

At the line where value is bound, the program exits with a SIGSEGV

Environment

Operating system: Arch Linux Kernel version: 4.15.3-1-ARCH LLVM version: 5.0.1-2 Inkwell version: 0.1.0#eafca272 rustc -vV output:

rustc 1.25.0-nightly (3ec5a99aa 2018-02-14)
binary: rustc
commit-hash: 3ec5a99aaa0084d97a9e845b34fdf03d1462c475
commit-date: 2018-02-14
host: x86_64-unknown-linux-gnu
release: 1.25.0-nightly
LLVM version: 6.0
Redrield commented 6 years ago

Playing around more I found that it was just me not knowing how GEP works. Equivalent code using stuff from librustc_trans segfaults as well.

nicokoch commented 6 years ago

This is still a bug though... segfaults are not allowed to happen in safe rust. So maybe inkwell is missing some sanity checks?

71 commented 6 years ago

Unfortunately, I've only scratched the surface of what LLVM has to offer but I've learned it will segfault for ANYTHING, which is why most methods do not return status codes I suppose. While I agree that Inkwell should indeed provide a completely safe wrapper around LLVM, it has to be known that this will be very hard, or maybe impossible in some cases.

TheDan64 commented 6 years ago

Yep, I agree with @6A's assessment. There are some things due to LLVM's nature that will just be too difficult to wrap and should either remain unsafe or safe with a documentation note (ie if it's just a rare edge case). I will reopen the ticket so we can at least triage and research what are our reasonable options to deal with this issue

TheDan64 commented 6 years ago

Thank you @nicokoch for bringing it up!

71 commented 6 years ago

In the meantime, we could start writing down somewhere all those "gotchas". I'll start: building instructions in a builder when no block has been created in that function will most likely result in a segfault.

TheDan64 commented 6 years ago

@6A if you can get a concrete example of this segfault, feel free to open a separate issue. I think that's the best way to track them. Maybe I will start using a "gotcha" or "LLVM gotcha" label to these type of bug issues

TheDan64 commented 6 years ago

Revisited this today and I'm not sure what can be done here... Maybe if we can encode pointer lengths into the type system, it could be smart enough to validate indexes that are OOB using something like the typenum crate or const generics might help. Going to push this until 0.2.0 since there's not much we can do about it at the moment.

Cypher1 commented 1 year ago

Does anyone have an example of safely constructing a global string with inkwell? I'm still unsure what the 'safe' way to do it is (and some of these examples no longer compile due to changes in the AddressSpace struct/enum, so I'm unsure what I'm doing wrong and if I'm making things worse as I 'fix' things)