Hamled commented 5 years ago

Summary

This PR updates Qira's build scripts to support building QEMU version 3.1.0 for that tracer.

Notes

Warning: Until the related QEMU patch PR is merged in, this will break the QEMU build script.

The tracers/qemu_build.sh script assumes that the appropriately patched QEMU source code can be found on GitHub at geohot/qemu, branch v3.1.0-qira.

Hamled commented 5 years ago

The Travis CI build is failing for this PR: https://travis-ci.org/geohot/qira/builds/511206681

I thought it would be due to the build script trying to clone a non-existent branch from geohot/qemu, but it appears that the current failure is because it cannot install the libcapstone-dev package that is required for QEMU 3.1.0.

I'm not familiar enough with the Travis CI setup to know for sure, but my best guess is that libcapstone-dev was added after 14.04 and perhaps that's what the CI is configured to use? I didn't see anything in the Travis config file that was specifying a version.

Hamled commented 5 years ago

Okay now the Travis CI tests are failing for the Right Reasons.

geohot commented 5 years ago

Ok, QEMU upstreamed. Before I merge this, we should add a correctness check that both QEMU produce the same log for qira_tests/bin/loop. I will try to do it tonight, but if you beat me to it that's great :)

Hamled commented 5 years ago

I'll definitely check that out, however I don't think I'll have time until next week.

Please feel free to check and merge if things are working!

Hamled commented 5 years ago

Okay, so I think I understand what you've asked me to check regarding the logs. Please let me know if I'm incorrect.

What I've done is run the Docker build script on the current master branch and then the same for this PR's branch.

Since the Dockerfile involves running the tests after everything is built, I was able to copy out the log files those tests generate (I believe it's just one trace for the qira_tests/bin/loop executable).

Comparing the generated files from both branches, there are some differences:

Trivial differences in _base and _strace
Probably trivial differences in _env
Several differences in the binary file with no suffix (I assume this is the QEMU tracer output).

Due to my inexperience with Qira and the fact that the trace file is a binary format, I don't have any sense yet whether those differences are meaningful or just vary with every trace.

Here are the two log directories for your inspection: https://www.dropbox.com/sh/s69taki0lzgg27t/AACCggiu5Ynr-9Q-vkx-c8eLa?dl=0

geohot commented 5 years ago

I want to merge this, but I'm concerned about regressions. If someone diffs the binary file and makes sure the differences aren't important, I'll merge.

Hamled commented 5 years ago

I've got some time again this week. I'll dig more into what the binary files are and see if I can reach a conclusion about the differences.

Hamled commented 5 years ago

I've been working on this over the weekend and into this week, and I just wanted to give a progress update. My suspicion is that the differences in the QEMU trace file are not important, but I'm continuing to dig into it more so I can verify that.

I believe they're not likely to be important based on creating a much simpler version of the loop program used for testing, which uses the write syscall directly and doesn't allocate any heap or stack memory.

Comparing multiple runs of that program using QEMU 3.1.0 and 2.5.1, the only differences in the trace file are a couple of spots where the changes for a given changelist are re-ordered because the TCG ops resulting from the x86 instruction inc eax have also been re-ordered between the versions.

My hope is that this is also the case for the dynamically linked loop program using printf, but I'm still investigating my debug output (based on the original debug lines you have in the QEMU patch). I need to do additional filtering to separate out all of the changes and changelist log lines that are from libc, which comprise > 99.9% of the instructions executed, so I can at least confirm whether the executable's resulting changes are the same.

Hamled commented 5 years ago

After filtering out all of the instructions from library code in the QEMU debug logs I'm working from, there are three categories of differences I see between 2.5.1 and 3.1.0:

The leave instruction at 0x400565 (changelist 129) has slightly different changes, because the TCG ops are different now. It looks like they maybe optimized it a bit to get rid of a temporary variable.
The stack pointer is consistently at a different location when _start is called by the interpreter. It's the same between all runs for the same version of QEMU, but it's always at an address 16 bytes higher in version 2.5.1. This could be the result of something else being pushed onto the stack, but it feels more likely to be due to aligning the stack different at the beginning of execution. This causes a lot of noise in terms of changes, since everything read/writing to the stack is then using a different address, but I don't think it's an important difference.
There are a small number of spots where some CPU flag has a 1 read in some runs and a 0 in others. This is not consistent across runs for the same version (although two runs in 3.1.0 happened to have no such differences, and so have exactly the same trace file except for PID). The most of these that I've seen in my runs is 8.

I'll be digging more into that final case to understand better what's going on, but so far everything is confirming that this patch is consistent with the results of the 2.5.1 patch.

Please let me know if there are additional test cases to investigate, since I figure this loop code is too simple to exercise all the things that might have changed to cause regressions.

Hamled commented 5 years ago

Quick update:

The stack setup by QEMU's elf loader has some padding in it for alignment purposes since this commit which got merged in 2.9.0. This can be confirmed in the diff of the _env files (there's additional zero bytes between the platform name and the 16 bytes of random data, to get the random bytes on a 16-byte alignment, and again before auxiliary, environment, and args vectors are put on the stack).
The value being read that is different between various runs of the program on the same version, is the cc_src2 member of CPUX86State which is some emulator-internal variable, for TCG's custom condition code implementation. I doubt that we should even be tracking this in the trace file, because it is unlikely to mean anything for the program's analysis, and at best constitutes noise.

I see that there are some conflicts now, I will rebase this branch to correct those, should you wish to merge this in.

janbbeck commented 4 years ago

would you be willing to help port qemu 4.1 stable to qira?

Hamled commented 4 years ago

would you be willing to help port qemu 4.1 stable to qira?

I'd be interested for sure. Going through all the stuff to verify (as much as possible, at least) the port to 3.1 helped me learn a lot more about how qemu works in user mode.

I probably won't be able to dedicate time for the next week or two, but I'd be happy to check out whatever you're working on!

janbbeck commented 4 years ago

Well, I just patched in your changes from 3.1 into 4.1 and it compiled ok and qira runs without errors, but the browser shows all fields empty. Have you seen this/ any ideas?

janbbeck commented 4 years ago

I should clarify. When running qira /bin/ls the terminal window also shows no output - i.e the listing is not there.

janbbeck commented 4 years ago

Update: Careful melding of your changes with v4.0.0-rc0 did work fine.

janbbeck commented 4 years ago

Unclear. I had bit off too much at once. Going from from 3.1 to the lowest 4.x version number was small enough that I could make good merging decisions. I have now been able to go from 4.0.0-rc0 to 4.0.0, but going to 4.2 from there in one step did not work. There were too many code changes - they seem to have added plugin support for exactly this type of thing (qira).

I did go from 3.x to 4.x because I had a number of error/warning messages in qira and I suspected Qemu. Those particular error messages seem to have gone away with the latest port to 4.0.0 , but qira still capitulates on the crackme I am trying to run it on. The PIN tracer also gets defeated by this crackme ... nanomites and some jump into middle of instruction techniques...

Since the error messages went away, I don't really plan to go to 4.1.0, but I have a patched and working 4.0.0-qira qemu on my hard drive. I suppose I should put it on github. Any interest?

Off topic: Other than Tetrane, are you aware of a qira like tool that can emulate a whole VM and then pull a single process out of the recording and display it like qira?

On Wed, Jan 29, 2020 at 1:38 PM Charles Ellis notifications@github.com wrote:

Excellent! What was the issue with the qira output?

I've basically only used qira for running small test programs when doing the qemu upgrade, so I'm not well versed with it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/geohot/qira/pull/210?email_source=notifications&email_token=ACXLUUK7TESNJ7KNBPXAIBDRAFMCZA5CNFSM4HBGI3N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKGXLKI#issuecomment-579696041, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXLUUPQWA3WX5TA3L2JUNTRAFMCZANCNFSM4HBGI3NQ .

Hamled commented 4 years ago

Yes I would be interested in your 4.0.0 port of the qira patch for qemu.

I'll also check out what they've done to implement the plugin system. After doing the 3.1 port, I was left with a distinct impression that a long-term improvement to qira would probably benefit from a re-architecture of the qemu integration.

Specifically, it would most likely be better to separate out the qemu patch/plugin entirely, and have a well-defined format for the trace log output, which could then be loaded by qira.

Porting the patch to 4.1+ and its new plugin system might be the ideal time to make such a change to qira's design.

As for alternatives, the only thing I've seen so far (outside of some academic work) is a project called rr, from Mozilla. It only works on Linux x86_64 (because it uses some hardware- and OS-specific features to achieve deterministic replay), but that might work for your use case. Check it out: https://rr-project.org/

It's primarily designed for debugging use by developers, rather than reverse engineering, but I've been wanting to see if it could be integrated into an RE tool. I was planning to focus on Binary Ninja and maybe REDasm.

janbbeck commented 4 years ago

Ok, figured it out: https://github.com/janbbeck/qemu/tree/v4.0.0-qira

I did play with rr a while back. It's nice, but does rely on ptrace and is thus similarly vulnerable to ptrace based protection.

janbbeck commented 4 years ago

Hamled, can you explain in more detail what you did in terms of binary regression testing?

edit: typo

Hamled commented 4 years ago

@janbbeck It's been quite a bit since I've thought about the regression testing for the binary trace format for qira, but here's what I remember so far of my work:

There's two parts to what I did:

First was to create a minimal, statically linked binary that basically does the same thing as the code in loop.c, but with no code from libc.

That code is here, it's based on linux-syscall-support (which is added to that branch in the parent commit). This means while it will compile for any platform that LSS supports, it only works on Linux.

Honestly the simple assembly tests could also suffice for this purpose, the whole idea is to have a trace that is as small and as deterministic as possible with qira, through whatever means.

This is important because the next step is to...
Manually compare the same trace as generated by qira/qemu-2.12 and qira/qemu-3.1.0. Comparing the binary trace files is doable but a huge pain, so I re-enabled, updated, and extended the debug logging code that @geohot had put into his qemu patch.

The code changes to support this happened in both the qira project and the qemu fork:
- In my qira fork, in the qemu-3.1.0-debug branch. relevant changes
- In my qemu fork, in the qira branch. relevant changes
The summary of those changes is to fix compiler errors related to the format string portability issues in the original debug logs, make QIRA_DEBUG use the qemu logging system (this may be new since 2.12 idk), and update qira's qemu driver to pass the necessary command-line options to enable that logging.

After that, I had some human-readable debug logs to actually compare. There were a lot of differences, which after quite a bit of diff-fu using Beyond Compare, led me to assign "blame" for each of the differences to particular changes in qemu's emulator implementation.

Unfortunately that part is where my memory of the specifics is pretty hazy. All I can say is that Beyond Compare's ability to re-align diffs, filter out most of the lines, and other fanciness proved crucial to being able to get more than just noise out of it (other diff tools might support this, idk).

If you want to go through the above steps to generate a very minimal debug log for a qira/qemu-4.0.0 build and the same for qira/qemu-2.12, I can try to look at them again in Beyond Compare and let you know if any particular tricks come back to me.

Hamled commented 4 years ago

In case you should find it useful, I've uploaded the log files I actually used for my analysis: https://github.com/Hamled/qira/blob/qemu-3.1.0-debug-logs/logs/qemu-trace-logs.tar.gz

The comment for the commit adding that file explains its contents, but I'll paste it because it's in markdown format and will look nicer here: Qemu trace and debug logs comparing qira patches

These are log files I used to compare the qemu traces and qira-related debug logs to determine if there were any regressions in the binary trace logs generated, since they were different between the patch for qemu 2.12 and qemu 3.1.0.

The logs are committed as an archive, because in total they're over 600 MB uncompressed. The archive has the following structure:

The root has directories for each test case run: loop and minloop. Loop is the standard loop test from test_auto/source-autogen/loop.c. Minloop is the same as loop, but without using anything from libc to achieve deterministic execution (as much as possible with qemu).

Within each of those is a directory for the platform used, either 16.04 or local. "16.04" is the docker container running on Ubuntu 16.04, and "local" is when qemu was just on my local machine, running Arch Linux updated to whatever was current in July of 2019.

Within each platform directory is the directory for the qemu version, 2.5.1 or 3.1.0, and within there are directories for traces from each run of the binary.

Each trace directory has the following files:

0 - the binary trace file
0_base - The contents of /proc/self/maps for qemu when the trace was run. The program's maps are in there, but it clearly also has maps for things that only qemu uses (or are mapped in by qemu for some reason).
0_env - Contents of the stack for the binary being traced, just before the entry point is executed.
0_qemu - Debug log from qemu, including TCG-related logging about each operation executed, as well as qira-specific logging for each change that is included in the binary trace file, etc.
0_qemu_chng - Same as above but filtered to only include the logs lines that correspond to data which is output into the binary trace file (e.g. setting changelists and recording of changes within each changelist).
0_strace - strace log for the binary being traced

The traces for the loop test also include 0_qemu_proc and 0_qemu_proc_chng, which are the same logs, but filtered to not include log lines for instructions executed by library code. This cuts down on the noise significantly, but for your own sanity if you're reviewing these files, the minloop test traces are vastly smaller.

There are also a couple of other random files:

loop/16.04/qemu-2.5.1/ld-2.23.so - Cannot recall why this is here anymore, maybe I needed a specific version of ld inside of the docker container?
filter_proc_changelists.rb - Ruby script to filter the log files to do the filtering out of library code mentioned above.

Hamled commented 4 years ago

As an example for why the minloop test is useful, if you compare the changes-only log output from multiple traces on the same version and platform, there are zero differences.

This deterministic basis I think is necessary for then comparing the diffs of runs from the two versions to identify changes that are only due to differences in qemu's TCG implementation.

This doesn't apply to the full log, at least the ones I have, because of changes in where the memory is mapped due to ASLR. While qemu's user mode doesn't actually implement ASLR for the binary it is emulating, a lot of qira's debug logging (like read/write logs) include addresses that are in the "host" address space.

I dunno why I didn't think of it at the time, but probably if you ran qemu with ASLR turned off, these would produce logs that were also exactly the same between multiple runs? It's probably worth doing either way.

The standard loop test case has changes being recorded which are just straight-up different in multiple runs from the same version, sometimes different data is being written than other times. I can't say for sure, but it might just be a result of how complicated the printf code is, like maybe it's doing heap allocations and then malloc and free have to walk a data structure which isn't exactly the same on each run? Dunno.

janbbeck commented 4 years ago

Thank you very much for all that information. I'm on it.

Jan

On Mon, Mar 30, 2020 at 5:19 PM Charles Ellis notifications@github.com wrote:

As an example for why the minloop test is useful, if you compare the changes-only log output from multiple traces on the same version and platform, there are zero differences.

This deterministic basis I think is necessary for then comparing the diffs of runs from the two versions to identify changes that are only due to differences in qemu's TCG implementation.

This doesn't apply to the full log, at least the ones I have, because of changes in where the memory is mapped due to ASLR. While qemu's user mode doesn't actually implement ASLR for the binary it is emulating, a lot of qira's debug logging (like read/write logs) include addresses that are in the "host" address space.

I dunno why I didn't think of it at the time, but probably if you ran qemu with ASLR turned off, these would produce logs that were also exactly the same between multiple runs? It's probably worth doing either way.

The standard loop test case has changes being recorded which are just straight-up different in multiple runs from the same version, sometimes different data is being written than other times. I can't say for sure, but it might just be a result of how complicated the printf code is, like maybe it's doing heap allocations and then malloc and free have to walk a data structure which isn't exactly the same on each run? Dunno.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geohot/qira/pull/210#issuecomment-606027982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXLUUK7YH6NPUPXPTGOHZLRKCS5RANCNFSM4HBGI3NQ .

geohot / qira

Use QEMU version 3.1.0 #210

Summary

Notes