camshaft / bolero

property testing and verification front-end for Rust
https://camshaft.github.io/bolero
MIT License
185 stars 16 forks source link

Advice on debugging a failure. #18

Closed ckaran closed 4 years ago

ckaran commented 4 years ago

I need some advice on how to track down a failure that I strongly suspect is coming from bolero. I have 3 tests that consistently fail, but they don't produce any failure output; they just get marked as failures. My suspicion is that its because they are running for more than 60 seconds, and that deep in the bowels of bolero they aren't producing enough output, so that's why the test is failing. However, I don't have a good way of verifying this, and there is too much code for me to produce a reduced example. Do you have any suggestions?

Here is an example of the cleaned up output:

cfkaran2@Hammer:~/Documents/repositories/bit_network$ RUST_BACKTRACE=full cargo test -- configuration_parsing::scenario_parser::tests::_config_tests::try_from_scenario

<Removed all the compilation results>

running 1 test
test configuration_parsing::scenario_parser::tests::_config_tests::try_from_scenario ... FAILED

failures:

failures:
    configuration_parsing::scenario_parser::tests::_config_tests::try_from_scenario

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 227 filtered out

error: test failed, to rerun pass '--lib'
cfkaran2@Hammer:~/Documents/repositories/bit_network$ 

This happens in both debug and release mode. I have all debug symbols on. There actually are two lines with failures in it, that isn't a copy/paste error.

ckaran commented 4 years ago

Well, I've modified my code so that all collection objects have something like #[generator(HashSet::gen().with().len(0usize..5))] set on them. This ensures that no test runs for over 60 seconds, but that still didn't fix the problem. And I still don't get any output from the failed tests, so I have no idea what's going on that's causing the failure. Any advice would be appreciated!

ckaran commented 4 years ago

OK, I have a suspicion, but need confirmation. I remember that bolero has to search for symbols, which is why you need all the debug symbols turned on; how does it handle two symbols that have the same name, but are in different paths? I have several structs defined in the same crate that have the same name (Message), but which are on different paths. Rust is OK with this because the namespaces are different, but I don't know if the fuzzers/bolero can correctly handle this situation. Note that I do import them into the same file in several places (using as to create new names for them). I have no idea if that is the problem, it just seemed plausible to me.

ckaran commented 4 years ago

Interesting new find; running cargo test and cargo bolero fuzz yield different output sometimes. I don't mean that bolero finds new results, I mean quite literally, the former fails with no output, while the latter will always yield some amount of output. Not sure what is going on there, and although I've now fully refactored my code in preparation for creating a reduced test crate, I haven't had a chance to do so yet. Once I'm sure I can reproduce this bug and produce a small test crate, I'll either update this issue, or close it out in favor of a better named one.

ckaran commented 4 years ago

I finally had a failure that might some shed light on this:

cargo bolero fuzz -t 100sec communications::messages::message::tests::merge
    Finished test [unoptimized + debuginfo] target(s) in 0.07s

running 1 test
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Seed: 1089555166
INFO: Loaded 1 modules   (1159931 inline 8-bit counters): 1159931 [0x5600931efe00, 0x56009330b0fb), 
INFO: Loaded 1 PC tables (1159931 PCs): 1159931 [0x56009330b100,0x5600944be0b0), 
INFO:     1663 files found in /home/cfkaran2/Documents/repositories/bit_network/bit_network_library/src/communications/messages/__fuzz__/communications__messages__message__tests__merge/corpus
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: seed corpus: files: 1663 min: 1b max: 1617b total: 672085b rss: 84Mb
#1664   INITED cov: 7151 ft: 29966 corp: 535/164Kb exec/s: 832 rss: 95Mb
#1916   NEW    cov: 7151 ft: 29971 corp: 536/166Kb lim: 1577 exec/s: 638 rss: 95Mb L: 1511/1567 MS: 2 ChangeASCIIInt-ChangeBit-
#2008   REDUCE cov: 7151 ft: 29971 corp: 536/166Kb lim: 1577 exec/s: 669 rss: 95Mb L: 875/1567 MS: 2 ChangeBinInt-EraseBytes-
#2048   pulse  cov: 7151 ft: 29971 corp: 536/166Kb lim: 1577 exec/s: 682 rss: 95Mb
#2225   NEW    cov: 7151 ft: 29979 corp: 537/166Kb lim: 1577 exec/s: 741 rss: 95Mb L: 152/1567 MS: 2 ChangeBit-ChangeBit-
F
failures:

failures:
    communications::messages::message::tests::merge

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 353 filtered out

==30473== ERROR: libFuzzer: fuzz target exited
SUMMARY: libFuzzer: fuzz target exited
MS: 2 InsertByte-ChangeByte-; base unit: a5cab92cdd25d4d8fbe849e1c08cf3c9da97daec
artifact_prefix='/home/cfkaran2/Documents/repositories/bit_network/bit_network_library/src/communications/messages/__fuzz__/communications__messages__message__tests__merge/crashes/'; Test unit written to /home/cfkaran2/Documents/repositories/bit_network/bit_network_library/src/communications/messages/__fuzz__/communications__messages__message__tests__merge/crashes/crash-b90311f056ed895f4fd4ab9374f090e292023fa6
error: process exited with status 77

It's the exit code that I'm looking at; it's defined within LLVM itself in FuzzerFlags.def. So I suspect that either my code, or bolero, are somehow triggering this error. I'm going to create a new bug on the LLVM project, and reference this bug. Maybe we can all get to the bottom of this somehow!

ckaran commented 4 years ago

Bug is now filed at https://bugs.llvm.org/show_bug.cgi?id=46468

ckaran commented 4 years ago

I tried to create a reduced test, but now things are weirder than ever. Copying a failing test and its corpora into a different project doesn't fail, but it continues to fail in the original project, even after I've fully cleaned everything and rebuilt from scratch. This is an evil bug.

camshaft commented 4 years ago

Not really sure what's going on here. It not being reproducable would indicate some weird state the original project is in, but I'm not sure. Keep me posted on what you find!

ckaran commented 4 years ago

I suspect I've found something related to this bug, and I've created a reduced test for it. See MR #23, in the issues/reduced_bolero_test directory.

That said, I don't think that fixing that bug will solve this bug entirely; it feels like that there is more going on here.

EDIT

Be warned that it is designed to fail; thus, all of the tests on the MR have failed. Don't let that prevent you from looking at it.

Also, I know that I'm illustrating what could be argued to be a user error (having an assert!() within the TypeGenerator implementation), but as types become more complicated, panics become more likely. So having a way to get output from the TypeGenerator itself is necessary.

EDIT 2

It looks like this is the root cause of a lot of the bugs after all!:

cfkaran2@Hammer:~/Desktop/reduced_bolero_test$ cargo clean && cargo bolero fuzz tests::serialize -s address
   Compiling libc v0.2.72
   Compiling getrandom v0.1.14
   Compiling proc-macro2 v1.0.18
   Compiling cfg-if v0.1.10
   Compiling unicode-xid v0.2.1
   Compiling syn v1.0.33
   Compiling byteorder v1.3.4
   Compiling gimli v0.22.0
   Compiling ppv-lite86 v0.2.8
   Compiling anyhow v1.0.31
   Compiling adler v0.2.2
   Compiling cc v1.0.58
   Compiling rustc-demangle v0.1.16
   Compiling object v0.20.0
   Compiling either v1.5.3
   Compiling pretty-hex v0.1.1
   Compiling lazy_static v1.4.0
   Compiling miniz_oxide v0.4.0
   Compiling quote v1.0.7
   Compiling rand_core v0.5.1
   Compiling bolero-libfuzzer v0.5.0
   Compiling rand_chacha v0.2.2
   Compiling addr2line v0.13.0
   Compiling rand v0.7.3
   Compiling backtrace v0.3.50
   Compiling bolero-generator-derive v0.5.2
   Compiling bolero-generator v0.5.2
   Compiling bolero-engine v0.5.2
   Compiling bolero v0.5.2
   Compiling reduced_bolero_test v0.1.0 (/home/cfkaran2/Desktop/reduced_bolero_test)
    Finished test [unoptimized + debuginfo] target(s) in 31.34s

running 1 test
INFO: Seed: 3394544181
INFO: Loaded 1 modules   (94460 inline 8-bit counters): 94460 [0x563c04f68560, 0x563c04f7f65c), 
INFO: Loaded 1 PC tables (94460 PCs): 94460 [0x563c04f7f660,0x563c050f0620), 
INFO:        0 files found in /home/cfkaran2/Desktop/reduced_bolero_test/src/__fuzz__/tests__serialize/corpus
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
F
failures:

failures:
    tests::serialize

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

==4730== ERROR: libFuzzer: fuzz target exited
    #0 0x563c03f4b631  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x36f631)
    #1 0x563c0405a74f  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x47e74f)
    #2 0x563c040366cb  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x45a6cb)
    #3 0x563c040364e3  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x45a4e3)
    #4 0x7f02bdc7e0f0  (/lib/x86_64-linux-gnu/libc.so.6+0x430f0)
    #5 0x7f02bdc7e1e9  (/lib/x86_64-linux-gnu/libc.so.6+0x431e9)
    #6 0x563c04c77f46  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x109bf46)
    #7 0x563c04c66c7e  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x108ac7e)
    #8 0x563c03fe2a00  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x406a00)
    #9 0x563c03fe3f0c  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x407f0c)
    #10 0x563c03f90ef0  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x3b4ef0)
    #11 0x563c03fa2b2f  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x3c6b2f)
    #12 0x563c04c6e5d2  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x10925d2)
    #13 0x563c03fa2a4a  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x3c6a4a)
    #14 0x563c03f90f71  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x3b4f71)
    #15 0x7f02bdc5cb96  (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #16 0x563c03ec9b79  (/home/cfkaran2/Desktop/reduced_bolero_test/target/fuzz/build_859dae3fae0b1004/x86_64-unknown-linux-gnu/debug/deps/reduced_bolero_test-730d148aa8de3eef+0x2edb79)

SUMMARY: libFuzzer: fuzz target exited
MS: 0 ; base unit: 0000000000000000000000000000000000000000

artifact_prefix='/home/cfkaran2/Desktop/reduced_bolero_test/src/__fuzz__/tests__serialize/crashes/'; Test unit written to /home/cfkaran2/Desktop/reduced_bolero_test/src/__fuzz__/tests__serialize/crashes/crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64: 
error: process exited with status 77

The status code is the same as what I had described above, and the backtrace seems to show the crash occurs in libFuzzer, the same as before. So this test is more useful for this bug than I originally thought! :grinning:

ckaran commented 3 years ago

Thank you for the new release (#47), it's already allowed me to catch some new errors that were being masked (fixes #27 & #28)!