Merge DUMPER and DEBUG_DUMPER

PoignardAzur commented 9 months ago

Nonetheless, there may still be programs that only result in a difference with the fast dump_var, but the bug disappears when it is tested again with the debug dump_var. In this case, we still have a reproduction and are still able to investigate the miscompilation, only more difficult

Have you considered merging the two dumper functions? Something like this:

#[inline(never)]
fn dump_var(
    val0: impl Hash + Debug,
    val1: impl Hash + Debug,
    val2: impl Hash + Debug,
    val3: impl Hash + Debug,
) {
  if some_global_variable == DEBUG_MODE {
    println!("fn{f}:_{var0} = {val0:?}\n_{var1} = {val1:?}\n_{var2} = {val2:?}\n_{var3} = {val3:?}");
  }
  else {
    unsafe {
      val0.hash(&mut H);
      val1.hash(&mut H);
      val2.hash(&mut H);
      val3.hash(&mut H);
    }
  }
}

The global variable would be set in main at runtime. Since the programs are guaranteed to be deterministic, you're guaranteed to get the same bugs for both branches. Since dump_var is already marked as #[inline(never)], the compiler would never optimize the checks away. The cost would be an additional always-predicted branch, which doesn't sound too bad.

PoignardAzur commented 9 months ago

(Loved the paper, btw. Differential fuzzing of compilers is something I have a vested interested in, so this was super valuable to me.)

cbeuw commented 9 months ago

Thanks for the suggestion. Considering I'm already marking both versions of dump_var with #[inline(never)], theoretically, bugs should always exist in both versions, already. But compilers work in mysterious ways so #[inline(never)] is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.

I have been thinking about something like this though, and it makes getting a pure LLVM IR reproduction easier by not depending on Rust's standard library for printing (but this doesn't work in Miri)

use std::ffi::{c_char, c_int}

extern "C" {
    fn printf(fmt: *const c_char, ...) -> c_int;
}

fn dump_var(...) {
    printf("...", var...);
}

RalfJung commented 9 months ago

(but this doesn't work in Miri)

One option here would be something like

fn print_i32(x: i32) {
  extern "C" {
      fn printf(fmt: *const core::ffi::c_char, ...) -> core::ffi::c_int;
  }

  if cfg!(miri) {
    println!("{x}");
  } else {
    unsafe { printf(b"%d\n\0".as_ptr().cast(), x); }
  }
}

Playground

RalfJung commented 9 months ago

Or probably this is better to avoid relying on dead code elimination:

#[cfg(not(miri))]
fn print_i32(x: i32) {
  extern "C" {
      fn printf(fmt: *const core::ffi::c_char, ...) -> core::ffi::c_int;
  }

  unsafe { printf(b"%d\n\0".as_ptr().cast(), x); }
}

#[cfg(miri)]
fn print_i32(x: i32) {
  println!("{x}");
}

PoignardAzur commented 9 months ago

But compilers work in mysterious ways so #[inline(never)] is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.

You could also pass it a &dyn HashDebug (and create the matching trait, etc) that would be initialized in the main. At the point I really don't think ~~MIRI~~ LLVM can possibly inline anything.

Also, using a dyn trait would probably improve your build times, I think?

RalfJung commented 9 months ago

Miri doesn't do optimizations, those are only relevant for the LLVM backend.

FractalFir commented 7 months ago

Thanks for the suggestion. Considering I'm already marking both versions of dump_var with #[inline(never)], theoretically, bugs should always exist in both versions, already. But compilers work in mysterious ways so #[inline(never)] is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.

I have been thinking about something like this though, and it makes getting a pure LLVM IR reproduction easier by not depending on Rust's standard library for printing (but this doesn't work in Miri)
use std::ffi::{c_char, c_int}

extern "C" {
    fn printf(fmt: *const c_char, ...) -> c_int;
}

fn dump_var(...) {
    printf("...", var...);
}

If the option to not use the standard library is added, I could use rustlantis to fuzz my compiler backend. It targets .NET and is currently still very much WIP, and the standard Rust formatting does not work yet (due to codegen bugs). Currently, the biggest roadblock in development is detecting all the bugs, which rustlantis could help speed up significantly.

If you accept contributions, I could implement this printf - based formatting myself. The biggest question is - what to do with ADTs? They could either be never displayed or could implement a printf-based-formatting trait.

BTW, congrats on an amazing project and paper.

cbeuw / rustlantis

Merge DUMPER and DEBUG_DUMPER #1