geohot / qira

QEMU Interactive Runtime Analyser
MIT License
3.92k stars 471 forks source link

Add optional qemu v5.2 tracer #253

Closed korniltsev closed 3 years ago

korniltsev commented 3 years ago

Hi. I would like to propose a qemu v5.2 port to merge as an optional tracer.

Here is the qemu branch https://github.com/korniltsev/qemu/tree/qira_5.2

As you requested in the previous PR's I've done some tests.

To compare the logs I did the following:

Here is the comparision script https://github.com/korniltsev/qira/blob/korniltsev/qemu_v5.2/qira_tests/compare_qemu/test_q2_vs_q5.sh

TLDR: there are some differences in the log file. Some of them I "fixed" and some I did not (and I think we should not).

Some more info on the log difference:

  1. The first difference I noticed after the port is caused by TranslationBlock chaining. On the screenshot you can see that jmp 0x401007 does not look like a jump to 0x401007, but in fact it jumps and executes 0x401007 tb in a single tb_exec pass. I "fixed" it by allowing jmp_opt only for rep prefixed instructions. https://github.com/korniltsev/qemu/commit/7f669d8eeb5d89353c6ff1509082d1154ea11e34 Before dirty fix: 1  tb chaining jmp After: 1  tb patch q5

The rep instructions lead me to the second difference in the log

  1. rep instructions are chained differently for unknown to me reason( I can investigate though). On q2 the chain optimization is enabled on 3-d repetition and on q5 the optimization is enabled on 2-nd repetition. I did nothing to address this behavior change yet. In order to simplify log comparision(avoid clnum shift) I passed -d nochain argument to qemu. https://github.com/korniltsev/qira/blob/korniltsev/qemu_v5.2/qira_tests/compare_qemu/test_q2_vs_q5.sh#L32 2  q2 2  q5

  2. Next clnum difference(shift) I found was caused by auxv values. There are new auxv values AT_EXECFN, AT_SECURE. Also the order of the auxv values was reversed. To simplify log comparision I temporarly(only for comparision) removed new auxv entries and reversed order back in q5 build. I also aligned AT_RANDOM value in qemu v2 to start at 16 byte boundary. And made AT_RANDOM value equal cacacacacaca ... for q5 nd q2 versions. this way stack canary will be the same between them and log comparision will be simplier. These changes are only for comparision and are provided as separate patches. https://github.com/korniltsev/qira/blob/korniltsev/qemu_v5.2/qira_tests/compare_qemu/q5.patch#L18 https://github.com/korniltsev/qira/blob/korniltsev/qemu_v5.2/qira_tests/compare_qemu/q2.patch

  3. next clnum shift was because of cpuid instruction. The example is on the picture at change 21940. I temporarly "fixed" the difference by "disabling l3 cacheline" in cpuid. https://github.com/korniltsev/qira/blob/korniltsev/qemu_v5.2/qira_tests/compare_qemu/q5.patch#L78

Screenshot 2020-12-11 at 10 08 23 Screenshot 2020-12-11 at 10 08 16 At this point I have the same amount of clnums between both qemu versions and now it is possible to use diff utility to compare logs.

  1. next reg value difference is caused by rdtsc. it is not important I did nothing to it.
  2. next I found couple instructions changed slightly
    • some multiplication instructions removed useless duplication
      
      imul rax, r9

q2: 16089: 40285C -> 4 90000000 I 16089: 48 -> 40 80000040 R 16089: 0 -> C 80000040 R 16089: 48 -> 40 80000040 R << useless 16089: 0 -> C 80000040 R << useless 16089: 0 -> 300 C0000040 W 16089: 90 -> 300 C0000040 W 16089: 98 -> 0 C0000040 W 16089: 80 -> 402860 C0000040 W 16089: A8 -> 5 C0000020 W q5: 16089: 40285C -> 4 90000000 I 16089: 48 -> 40 80000040 R 16089: 0 -> C 80000040 R 16089: 0 -> 300 C0000040 W 16089: 90 -> 300 C0000040 W 16089: 98 -> 0 C0000040 W 16089: 80 -> 402860 C0000040 W 16089: A8 -> 5 C0000020 W


- avx instructions changed register numbers

movups xmm0, xmmword ptr [0x4a8080] q2: 16163: 442117 -> 3 90000000 I 16163: 30 -> 4A8080 80000040 R 16163: 4A8080 -> 4AC980 A0000040 L 16163: 318 -> 4AC980 C0000040 W 16163: 30 -> 4A8080 80000040 R 16163: 4A8088 -> 4AC988 A0000040 L 16163: 320 -> 4AC988 C0000040 W 16163: 80 -> 44211A C0000040 W q5: 16163: 442117 -> 3 90000000 I 16163: 30 -> 4A8080 80000040 R 16163: 4A8080 -> 4AC980 A0000040 L 16163: 320 -> 4AC980 C0000040 W 16163: 30 -> 4A8080 80000040 R 16163: 4A8088 -> 4AC988 A0000040 L 16163: 328 -> 4AC988 C0000040 W 16163: 80 -> 44211A C0000040 W

-bitscan instructions changed    

bsf eax, edx q2: 20396: 43AF96 -> 3 90000000 I 20396: 10 -> 800 80000040 R 20396: 90 -> 800 C0000040 W 20396: 90 -> 800 80000040 R 20396: 90 -> 800 80000040 R <<< duplicate 20396: 0 -> 485714 80000040 R 20396: 0 -> B C0000040 W 20396: 80 -> 43AF99 C0000040 W 20396: A8 -> 18 C0000020 W q5: 20396: 43AF96 -> 3 90000000 I 20396: 10 -> 800 80000040 R 20396: 90 -> 800 C0000040 W 20396: 0 -> 485714 80000040 R 20396: 0 -> B C0000040 W 20396: 80 -> 43AF99 C0000040 W 20396: A8 -> 18 C0000020 W


- `cmpxchg dword ptr [rbp], edx`  also has a dup

q2: 26483: 416153 -> 3 90000000 I 26483: 8 -> 1 80000040 R 26483: 10 -> 4AE110 80000040 R 26483: 4AE110 -> 0 A0000020 L 26483: 0 -> 0 80000040 R 26483: 4AE110 -> 1 E0000020 S 26483: 98 -> 0 C0000040 W 26483: 90 -> 0 C0000040 W 26483: 80 -> 416156 C0000040 W 26483: A8 -> 10 C0000020 W q5: 26483: 416153 -> 3 90000000 I 26483: 10 -> 4AE110 80000040 R 26483: 4AE110 -> 0 A0000020 L 26483: 0 -> 0 80000040 R 26483: 8 -> 1 80000040 R 26483: 10 -> 4AE110 80000040 R << dup 26483: 4AE110 -> 1 E0000020 S 26483: 0 -> 0 C0000040 W << strange write 26483: 98 -> 0 C0000040 W 26483: 90 -> 0 C0000040 W 26483: 80 -> 416156 C0000040 W 26483: A8 -> 10 C0000020 W


The good news is that all the log difference is only in registers and not in memory/instruction pointer.
The only memory difference I found in the loop example is in `fstat` syscall 

diff <(cat 0.txt| grep -v " R" | grep -v " W") <(cat 1.txt| grep -v " R" | grep -v " W") 28721,28724c28721,28724 < 22494: 40007FFB80 -> 0 E0000008 S < 22494: 40007FFB81 -> 0 E0000008 S < 22494: 40007FFB82 -> 0 E0000008 S < 22494: 40007FFB83 -> 0 E0000008 S

22494: 40007FFB80 -> BB E0000008 S 22494: 40007FFB81 -> 19 E0000008 S 22494: 40007FFB82 -> 89 E0000008 S 22494: 40007FFB83 -> 22 E0000008 S 28737,28740c28737,28740 < 22494: 40007FFB90 -> 0 E0000008 S < 22494: 40007FFB91 -> 0 E0000008 S < 22494: 40007FFB92 -> 0 E0000008 S < 22494: 40007FFB93 -> 0 E0000008 S

22494: 40007FFB90 -> BB E0000008 S 22494: 40007FFB91 -> 19 E0000008 S 22494: 40007FFB92 -> 89 E0000008 S 22494: 40007FFB93 -> 22 E0000008 S 28753,28756c28753,28756 < 22494: 40007FFBA0 -> 0 E0000008 S < 22494: 40007FFBA1 -> 0 E0000008 S < 22494: 40007FFBA2 -> 0 E0000008 S < 22494: 40007FFBA3 -> 0 E0000008 S

22494: 40007FFBA0 -> BB E0000008 S 22494: 40007FFBA1 -> 19 E0000008 S 22494: 40007FFBA2 -> 89 E0000008 S 22494: 40007FFBA3 -> 22 E0000008 S


I did not investigate the cause of the difference since I think it is not important, there is probably a change in fstat syscall implementation.

The qemu v5 barnch contains one extra commit https://github.com/korniltsev/qemu/commit/33200537a1a0202f600dad93676cc8da65525b60 It is not relevant to the qemu v5 port and I should probably move it to a separate PR.

Given above differences I don't think it is possible(easy) to get a fully equal log file. I think the log differences are insignificant. The q5 tracer is optional and is hidden behind -q5 qira flag.

parsed_logs.zip

What do you think?