fuzzware-fuzzer / fuzzware

Fuzzware's main repository. Start here to install.
Apache License 2.0
302 stars 51 forks source link

Extra basic blocks found after a code hook is added #26

Closed B03901108 closed 1 year ago

B03901108 commented 1 year ago

Hi, I am playing with the Python code of fuzzware/emulator/harness by adding some Unicorn hooks to the test harness.

I added a code hook in fuzzware/emulator/harness/fuzzware_harness/harness.py by putting uc.hook_add(UC_HOOK_CODE, mefoo) right next to maybe_register_global_block_hook(uc) in the configure_unicorn function. Then, I ran the modified fuzzware_harness.harness on two examples: P2IM/Steering_Control and P2IM/Reflow_Oven. I used the command python3 -m fuzzware_harness.harness --mmio-trace-out=tmp.mmio.trace --ram-trace-out=tmp.ram.trace --bb-trace-out=tmp.bb.trace <test-case>, with MMIO models specified in config.yml and a high-coverage test case. Both the models and the test case in use came from a 6-hour fuzzing of Fuzzware on each example.

In each example, the modified test harness logged a non-existent basic block in the tmp.bb.trace, encountered the first interrupts earlier (due to the interrupt modeling policy), and thus experienced a different sequence of MMIO accesses from that of the original test harness. A bare-minimum mefoo is enough to trigger this:

from Unicorn import UC_HOOK_CODE
def mefoo(uc, addr, size, user_data):
    return 

The extra basic block found in Steering Control (Reflow Oven) is 0x826a6 (0x8004e42). I have updated the Fuzzware code to the latest in the main branch, but the situations remain.

SWW13 commented 1 year ago

The underlying concept of basic blocks in Fuzzware are translation blocks in Unicorn/QEMU which do not directly correspond to most definitions of basic blocks (e.g. new translation block are created after returning from a call instruction).

There is also a size limit of translation blocks (both in guest code instructions and host emulation code) which can split translation blocks at seemingly random locations. When adding new hooks there will be more host code emitted, which can lead to additional translation blocks. My guess would be that this is the case here.

As you rightfully deducted this is leading to timing issues explaining the replay-ability issues.

B03901108 commented 1 year ago

@SWW13 Do you have any suggestions on how to minimally modify the Fuzzware code and avoid counting the basic blocks not in valid_basic_blocks.txt (per P2IM example)? I would like to at least not count them for the interrupt modeling. I noticed that the corresponding block hooks (?) may be in the native C part, but there are quite a few hooks there interacting with each other. So, I think it's more efficient to just ask. Thank you.

SWW13 commented 1 year ago

Do you have any suggestions on how to minimally modify the Fuzzware code and avoid counting the basic blocks not in valid_basic_blocks.txt (per P2IM example)?

I don't think that's possible without either taking a high performance impact (for looking up validity of basic blocks during execution) or mess with reproducibility by optionally injecting hooks at the start of (only some) translation blocks.

I would like to at least not count them for the interrupt modeling.

Do you mean the timing used for interrupts, if so why? The used definition for basic blocks is already rather arbitrary (e.g. compare different disassembler). The valid_basic_blocks.txt are extracted from IDA and to my understanding do not comply with the formal definition described by wikipedia.

Scepticz commented 1 year ago

Per-instruction hooks lead to the timing issues. In case you add translation block hooks, then the replay-ability will not be impacted. So if translation / basic block hooks are enough for you, then you may circumvent the issue by using those instead of per-instruction CODE hooks. For these, you can use the handlers config for which you can find some examples here: https://github.com/fuzzware-fuzzer/fuzzware-emulator/blob/main/README_config.yml#L106-L141

B03901108 commented 1 year ago

@SWW13 That's because I would like to replay and analyze individual test cases from Fuzzware + the fuzzed examples. In this case, performance is not quite a concern since I just want to re-run a few test cases using the same test harness (in terms of memory & MMIO accesses) as the fuzzing.

@Scepticz Indeed, I have already inserted some block hooks for my own use. The settings reported above is for triggering the situations with minimum changes to the Fuzzware code. Unfortunately, I need per-instruction hooks, but gladly just for replaying a few test cases.

SWW13 commented 1 year ago

If you filter for valid basic blocks during analysis you and up with the opposite issue: fewer basic blocks are counted for interrupt timings.

If performance is no issue for you I'd recommend adding (empty) hooks during fuzzing at the location you intend to add hooks later during analysis. This should circumvent timing issues.

B03901108 commented 1 year ago

@SWW13 That is one of my final resorts: adding the per-instruction hook as early as in the fuzzing stage. However, the fuzzing would then be inefficient. Performance is not an issue in my analysis stage, while I cannot say that for the fuzzing stage. I am still trying to circumvent the issue during analysis without changing the fuzzing's behavior.

Also, great thanks for all the feedback and suggestions.

SWW13 commented 1 year ago

If you want to go down the QEMU rabbit hole there is CF_COUNT_MASK and TCG_MAX_INSNS which define the max guest instructions per translation block. Lowering them should decrease the chance of translation block split ups. However I'm not sure if this will work on the old QEMU version Fuzzware / Unicorn 1 is using, I only looked into this in newer QEMU versions.

See https://github.com/fuzzware-fuzzer/unicorn

B03901108 commented 1 year ago

@SWW13 Thank you. I will go check these two parameters. Shouldn't I increase them since they define the MAX count?

SWW13 commented 1 year ago

No, this is a max target / guest instruction count. They will be lifted into TCG instruction and then lowered into host instruction. E.g. one arm add r1, 1 may end up with something along the lines of mov rax, cpu->r1; inc rax; mov cpu->r1, rax. The translation block is limited to a specific length of host instruction bytes, adding hooks adds additional host instructions.

When you decrease the max (target) instruction you will always create more translation blocks. So it's less likely the additional hooks will overflow the host translation block and create more translation blocks (only when analyzing).