capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.57k stars 1.55k forks source link

test_mc.sh generates tons of mismatches #1355

Closed david942j closed 5 years ago

david942j commented 5 years ago

There's a file suite/test_mc.sh can be used for testing if Capstone's output is same as llvm-mc's. However, tons of mismatches are generated when I run test_mc.sh on master branch. Some of the differences are tolerable, e.g. the signed v.s. unsigned immediate value:

Mismatch: 0x4c,0xf1,0xaa,0x28 = adc r8, r12, #2852170240
        MC = adc r8, r12, #0xaa00aa00
        CS = adc r8, r12, #-0x55ff5600
Mismatch: 0x47,0xf1,0xa5,0x39 = adc r9, r7, #2779096485
        MC = adc r9, r7, #0xa5a5a5a5
        CS = adc r9, r7, #-0x5a5a5a5b
Mismatch: 0x43,0xf1,0x07,0x45 = adc r5, r3, #2264924160
        MC = adc r5, r3, #0x87000000
        CS = adc r5, r3, #-0x79000000

This behavior is also mentioned in another pr's discussion: https://github.com/aquynh/capstone/pull/1303#discussion_r241695962

While some suites have (seems) harmful outputs:

# MC/ARM/thumb2-branches.s.cs
Mismatch: 0xff,0xe3 = b #2046
        MC = b #0x7fe
        CS = b #0x802
Mismatch: 0x00,0xf0,0x00,0xbc = b.w #2048
        MC = b.w #0x800
        CS = b.w #0x804
Mismatch: 0x66,0xf6,0x30,0xbc = b.w #-1677216
        MC = b.w #-0x1997a0
        CS = b.w #4293290084
Mismatch: 0x99,0xf1,0xcf,0xbb = b.w #1677214
        MC = b.w #0x19979e
        CS = b.w #0x1997a2

I know that test_mc.sh is currently not run on travis CI. Does this mean test_mc.sh is not suppose to pass, or we should include it to CI and fix Capstone to pass these tests? (Or, maybe the output of llvm-mc is wrong, we should choose another project (such as objdump) for testing?)

HarDToBelieve commented 5 years ago

I have already started to build some test-suites in C by using Cmocka, and I will PR as soon as possible

aquynh commented 5 years ago

@david942j i believe that Capstone is doing correctly with Arm/Thumb branch to IMM target (that it shows the absolute address), and LLVM is doing differently. so if you have any doubts, please confirm with other disasemblers, like objdump or IDA.

david942j commented 5 years ago

@aquynh yes you're right, for those branch instructions on ARM, Capstone has same output compared with objdump.

I randomly chose some mismatching tests and compared with objdump, for most instructions Capstone has same result as objdump, except this one (the only one I found manually):

# MC/X86/avx512-encodings.s.cs
0x62,0xf1,0x35,0x40,0x72,0x64,0xb7,0x08,0x02
MC = vpsrad zmm25, zmmword ptr [rdi + 4*rsi + 0x200], 0x2
CS = vpsrad zmm25, zmmword ptr [rdi + xmm6*4 + 0x200], 2
OD = vpsrad zmm25,ZMMWORD PTR [rdi+rsi*4+0x200],0x2

The original purpose of this issue is want to check if Capstone has proper full testing. I suggest to include suite/test_*.sh into make check's scripts. And since llvm-mc seems not a good-enough project for comparing, switching to objdump might be a good option.

aquynh commented 5 years ago

yes integrating test suite into CI is in our plan.

aquynh commented 5 years ago

@david942j that output from MC in the AVX512 code above is from the latest LLVM?

david942j commented 5 years ago

The output from MC in the AVX512 code above is fetched from LLVM 6.0.0. I just tried the latest release of LLVM, version 7.0.1, and has the exactly same result. Do you need me to try the master branch? It will take me some time since I've never compiled LLVM before.

aquynh commented 5 years ago

No worry, just want to confirm. This is a bug, we should fix that when syncing x86 arch with llvm.

HarDToBelieve commented 5 years ago

@david942j can you tell me how to use Objdump to disassemble a sequence of bytes? I exported them to a file then pass to Objdump, but it said my file was truncated. Btw, it seems that Objdump only handles some specific arch/target such as i386, iamcu, ...

david942j commented 5 years ago

I used the disasm utility of pwntools Yes objdump only handles some archs, AFAIK there're x86, arm, mips, ppc, and avr builds of objdump exist.

HarDToBelieve commented 5 years ago

~Hmm, pwntools doesn't support SystemZ, and my system said it'd failed 800 tests of this arch. Do you have any ideas to fix automatically these tests?~ Nvm, I think I can fix them by my hands, only problems about converting between hex and dec number

HarDToBelieve commented 5 years ago

LOL, still too many cases for handling, even in ARM. Do you have a complete script using objdump to convert output of LLVM, @david942j ?

david942j commented 5 years ago

Nope, I dumped the result manually. What kind of cases are you dealing with?

HarDToBelieve commented 5 years ago

I am writing a script with to help me fix problems about converting hex, but sometimes objdump uses prefix "0x", sometimes doesn't, and sometimes it even adds a space. Float number is also another problem

david942j commented 5 years ago

Sounds terrible.. For the inconsistent behavior that objdump prefixes a hex value with '0x' or not, is it a good idea to modify the source code of binutils directly? I just have a quick view of binutils, the main logic of printing an instruction can be found in opcodes/<arch>-dis.c#print_insn.

aquynh commented 5 years ago

close, thanks!