davidlattimore / wild

Apache License 2.0
468 stars 11 forks source link

Replace raw byte instructions with a procedural macro? #31

Open marxin opened 3 weeks ago

marxin commented 3 weeks ago

I'm curious if the various code snippets like:

// nopw (%rax,%rax)
0x66, 0x66, 0x66, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0, 0, 0, 0, 0,
// mov %fs:0,%rax
0x64, 0x48, 0x8b, 0x04, 0x25, 0, 0, 0, 0,
// mov %fs:0,%rax
// lea {offset}(%rax),%rax
.copy_from_slice(&[0x64, 0x48, 0x8b, 0x04, 0x25, 0, 0, 0, 0, 0x48, 0x8d, 0x80]);

can be replaced with a procedural macro that will include the bytes by using as:

use std::{
    env,
    fs::{read, File},
    io::Write,
    process::Command,
};

use object::{Object, ObjectSection};

const TEMP_S: &str = "/tmp/asm.s";
const TEMP_O: &str = "/tmp/asm.o";

fn main() {
    let mut insn = env::args().nth(1).unwrap();
    println!("insn: {insn}");
    insn.push('\n');

    File::create(TEMP_S)
        .unwrap()
        .write_all(insn.as_bytes())
        .unwrap();
    assert!(Command::new("as")
        .arg(TEMP_S)
        .arg("-o")
        .arg(TEMP_O)
        .status()
        .unwrap()
        .success());

    let data = read(TEMP_O).unwrap();
    let file = object::File::parse(&*data).unwrap();
    let code_data = file.section_by_name(".text").unwrap().data().unwrap();
    println!("code: {code_data:0x?}");
}
❯ ./target/debug/asm-macro ".nops 8"
insn: .nops 8
code: [f, 1f, 84, 0, 0, 0, 0, 0]
❯ ./target/debug/asm-macro "mov %fs:0, %rax"
insn: mov %fs:0, %rax
code: [64, 48, 8b, 4, 25, 0, 0, 0, 0]

Would you welcome such a change or it won't pay off as the number of snippets is rather small?

davidlattimore commented 3 weeks ago

Neat Idea. I'm open to it, but have some possible concerns.

For many of the asm snippets, I need to know an offset into the machine code at which to write some value. This is already a bit of a magic offset. If the machine code bytes aren't visible, then it's even more of a magic offset.

My other concern would be that it'd mean people would need to install an assembler in order to build wild. That wouldn't matter for people who download a precompiled binary from github (once those are set up), but it would matter for anyone who tries to use cargo install, which is, for better or worse, a popular way to install Rust tools. It'd also make it slightly harder for other projects to use Wild as a library, since anyone building those tools would need to install an assembler.

The machine code isn't hard to write as-is. I usually just copy the bytes from the output of linker-diff. If I'm writing assembly code that's passed to a proc macro, then I could imagine cases where it'd take a bit of work to write assembly that produces the desired byte sequence. Some of the instructions need padding bytes in order to end up the correct length. e.g. the byte sequence 0x66, 0x66, 0x66, 0x64, 0x48, 0x8b, 0x04, 0x25, 0, 0, 0, 0 is mov %fs:0,%rax, but I'm betting if you put mov %fs:0,%rax into an assembler, you'd get a significantly shorter sequence of bytes.

To get a feel for the current process, I recommend finding some bit of code in relaxation.rs and comment it out, then run the tests and look at the failure. For example, if I comment out the following code:

                    section_bytes[offset - 3..offset + 9].copy_from_slice(&[
                        0x66, 0x66, 0x66, 0x64, 0x48, 0x8b, 0x04, 0x25, 0, 0, 0, 0,
                    ]);

I get a test failure containing the following:

  wild 0x00597f5e 48 8d 3d 00 00 00 00 lea 0x25,%rdi  // Lea_r64_m(0x597f65) POINTER-TO(byte in (RX))
  wild 0x00597f65 e8 00 00 00 00 call 0x000000000000002A  // Call_rel32_64(0x597f6a) POINTER-TO(byte in (RX))
  ld   0x0040188e 66 66 66 64 48 8b 04 25 00 00 00 00 mov %fs:0,%rax  // Mov_r64_rm64(0x0, 0x0) NULL NULL
  ORIG            48 8d 3d 00 00 00 00 lea 0x25,%rdi  // R_X86_64_TLSLD -> `std::panicking::panic_count::LOCAL_PANIC_COUNT::{{constant}}::{{closure}}::VAL.0` -4
  ORIG            e8 00 00 00 00 call 0x000000000000002A  // R_X86_64_PLT32 -> `__tls_get_addr` -4
  TRACE           relaxation.kind=TlsLdToLocalExec value_flags=ADDRESS | CAN_BYPASS_GOT resolution_flags=DIRECT

We can see the bytes that we need on the ld line.

If we leave the machine code there, then I should probably try to make the code a little more readable though, since some of them aren't well commented at the moment.

marxin commented 3 weeks ago

For many of the asm snippets, I need to know an offset into the machine code at which to write some value. This is already a bit of a magic offset. If the machine code bytes aren't visible, then it's even more of a magic offset.

I was also thinking about this limitation and one might include a placeholder address (e.g. 0x12345678), then parse it in the encoded byte stream and provide a function that will replace the placeholder with an actual address. But sure, it's not ideal.

My other concern would be that it'd mean people would need to install an assembler in order to build wild. That wouldn't matter for people who download a precompiled binary from github (once those are set up), but it would matter for anyone who tries to use cargo install, which is, for better or worse, a popular way to install Rust tools. It'd also make it slightly harder for other projects to use Wild as a library, since anyone building those tools would need to install an assembler.

I fully agree with this, it's a solid limitation. I tried seeking for a library-only approach (without any system dependency) and there might be https://github.com/icedland/iced project. However, it's going to bring a lot of dependencies and as Wild's aspiration are also other platforms, the mentioned project provides only x86_64 support.

The machine code isn't hard to write as-is. I usually just copy the bytes from the output of linker-diff. If I'm writing assembly code that's passed to a proc macro, then I could imagine cases where it'd take a bit of work to write assembly that produces the desired byte sequence. Some of the instructions need padding bytes in order to end up the correct length. e.g. the byte sequence 0x66, 0x66, 0x66, 0x64, 0x48, 0x8b, 0x04, 0x25, 0, 0, 0, 0 is mov %fs:0,%rax, but I'm betting if you put mov %fs:0,%rax into an assembler, you'd get a significantly shorter sequence of bytes.

I've got the point. Based on all the comments mentioned, I would consider the current approach (byte sequences with comments) as better.