Investigate corpus loading

SiD3W4y commented 3 years ago

When loading a saved corpus from a previous fuzzing session, most inputs are detected as "not interesting". We should find out why as it hampers our capacity to resume fuzzing.

domenukk commented 2 years ago

It could be that another (potentially newer) testcase already produced this specific coverage pattern, so keeping a second input for this coverage wouldn't be useful for fuzzing. Think of it similar to afl-cmin :)

SiD3W4y commented 2 years ago

Ah yes I badly worded the issue :). The issue is that when loading a corpus saved from disk, our coverage number (like our coverage map) seems different than the one we stopped at. Which seems extremely weird. I get why the corpus number differs (the cmin stuff).

domenukk commented 2 years ago

Yes, then it sounds fishy. Maybe the coverage map doesn't get reset correctly before running/snapshotting. Or maybe the target has a different state when you rerun (unlikely for libs, I guess)

Agnoctopus commented 2 years ago

The entry point of the Tartiflette fuzzer inside the quickjs code, as defined in the snpashot, is the eval_buf function which has the following prototype:

static int eval_buf (JSContext * ctx, const void * buf, int buf_len,
                    const char * filename, int eval_flags)

It takes in particular as parameters:

buf: Containing the javascript code to be interpreted
buf_len: Telling the length of the javascript code

After some tests, it turns out that the javascript code laying in buf must be null terminated as requirement, thus reducing the interest of the parameter buf_len.

For example, depending on the place of the null character, we have:

            expr = "(function (){var var1=[0x1];}());console.log(42);var1['reverse']();";
            int len = strlen("(function (){var var1=[0x1];}());");
            if (eval_buf(ctx, expr, len, "<cmdline>", 0))
                goto fail;

Gives: SyntaxError: unexpected end of string.

While:

            //                                                        X
            expr = "(function (){var var1=[0x1];}());console.log(42);\0var1['reverse']();";
            int len = strlen("(function (){var var1=[0x1];}());");
            if (eval_buf(ctx, expr, len, "<cmdline>", 0))
                goto fail;

Gives: 42

And:

            //                                        X                 X
            expr = "(function (){var var1=[0x1];}());\0console.log(42);\0var1['reverse']();";
            int len = strlen("(function (){var var1=[0x1];}());");
            if (eval_buf(ctx, expr, len, "<cmdline>", 0))
                goto fail;

Gives nothing as intended.

After some tests carried out on the determinism of Tartiflette, it was seen that for each virtual machines used in the fuzzing sessions, each address space is correctly and entirely restored compared to the original virtual machine... except for the space that has been reserved by the user to put the input case tested as arguments of eval_buf (buf and buf_len). This is totally normal as the space is not dirty during the virtual machine execution, and is under the full control of the user.

Thus, as we do not null-terminate each of the tests, the different fuzz tests put one after the other causes the input buffer buf to not be null terminated which leads to behavior as seen above.

TEST-1:  (function (){var var1=[0x1];}());console.log(42);var1['reverse']();
INPUT-1: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX;console.log(42);var1['reverse']();
TEST-2:  (function (){var var1=[0x1];}());
INPUT-2: (function (){var var1=[0x1];}());console.log(42);var1['reverse']();

To Fix this determinism problem, we simply add a null byte at the end of each input to be tested.

Thank you for your guidance.

Agnoctopus / Tartiflette

Investigate corpus loading #7