Open PoignardAzur opened 9 months ago
(Loved the paper, btw. Differential fuzzing of compilers is something I have a vested interested in, so this was super valuable to me.)
Thanks for the suggestion. Considering I'm already marking both versions of dump_var
with #[inline(never)]
, theoretically, bugs should always exist in both versions, already. But compilers work in mysterious ways so #[inline(never)]
is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.
I have been thinking about something like this though, and it makes getting a pure LLVM IR reproduction easier by not depending on Rust's standard library for printing (but this doesn't work in Miri)
use std::ffi::{c_char, c_int}
extern "C" {
fn printf(fmt: *const c_char, ...) -> c_int;
}
fn dump_var(...) {
printf("...", var...);
}
(but this doesn't work in Miri)
One option here would be something like
fn print_i32(x: i32) {
extern "C" {
fn printf(fmt: *const core::ffi::c_char, ...) -> core::ffi::c_int;
}
if cfg!(miri) {
println!("{x}");
} else {
unsafe { printf(b"%d\n\0".as_ptr().cast(), x); }
}
}
Or probably this is better to avoid relying on dead code elimination:
#[cfg(not(miri))]
fn print_i32(x: i32) {
extern "C" {
fn printf(fmt: *const core::ffi::c_char, ...) -> core::ffi::c_int;
}
unsafe { printf(b"%d\n\0".as_ptr().cast(), x); }
}
#[cfg(miri)]
fn print_i32(x: i32) {
println!("{x}");
}
But compilers work in mysterious ways so
#[inline(never)]
is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.
You could also pass it a &dyn HashDebug
(and create the matching trait, etc) that would be initialized in the main. At the point I really don't think MIRI LLVM can possibly inline anything.
Also, using a dyn trait would probably improve your build times, I think?
Miri doesn't do optimizations, those are only relevant for the LLVM backend.
Thanks for the suggestion. Considering I'm already marking both versions of
dump_var
with#[inline(never)]
, theoretically, bugs should always exist in both versions, already. But compilers work in mysterious ways so#[inline(never)]
is not a bullet-proof optimisation boundary. Merging the two versions into one function probably won't solve it.I have been thinking about something like this though, and it makes getting a pure LLVM IR reproduction easier by not depending on Rust's standard library for printing (but this doesn't work in Miri)
use std::ffi::{c_char, c_int} extern "C" { fn printf(fmt: *const c_char, ...) -> c_int; } fn dump_var(...) { printf("...", var...); }
If the option to not use the standard library is added, I could use rustlantis
to fuzz my compiler backend. It targets .NET and is currently still very much WIP, and the standard Rust formatting does not work yet (due to codegen bugs). Currently, the biggest roadblock in development is detecting all the bugs, which rustlantis
could help speed up significantly.
If you accept contributions, I could implement this printf
- based formatting myself. The biggest question is - what to do with ADTs
? They could either be never displayed or could implement a printf
-based-formatting trait.
BTW, congrats on an amazing project and paper.
From the thesis paper:
Have you considered merging the two dumper functions? Something like this:
The global variable would be set in main at runtime. Since the programs are guaranteed to be deterministic, you're guaranteed to get the same bugs for both branches. Since dump_var is already marked as
#[inline(never)]
, the compiler would never optimize the checks away. The cost would be an additional always-predicted branch, which doesn't sound too bad.