capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.58k stars 1.55k forks source link

Add Hexagon support #1274

Open E3V3A opened 6 years ago

E3V3A commented 6 years ago

I'd love to see support for the Qualcomm Hexagon architecture. I also see that it is already supported in the Keystone project. What is needed and how can I help add this support?

XVilka commented 6 years ago

@E3V3A you can check radare2 instead, I added proper Hexagon support a while ago: https://github.com/radare/radare2/pull/9289

E3V3A commented 6 years ago

@XVilka Awesome! Thanks. Would it be possible to use that for capstone? (Yeah I love Radare2, but I'm customizing my disassembler for some specific purposes...)

aquynh commented 6 years ago

the reason i did not add Hexagon is because the assembly of this architecture is very different of all other architectures.

E3V3A commented 6 years ago

@aquynh

did not add Hexagon is because the assembly of this architecture is very different

So what does that mean? You (?) have already implemented the assembler (in Keystone), so there must be a fairly straightforward way to go back. Unfortunately, I don't know much about the internal workings of these tools, so I probably can't be of much more help than organization, Q&A and debugging and testing. There should be some way to reverse parse what you already have. But even better, with the friendly help of @XVIlka's work under LGPL, perhaps you could re-use much of that work?

aquynh commented 6 years ago

All architectures so far we have share the same output structure: mnemonic & operand string. Hexagon is quite different, as it has its own structure, that does not really fit here.

Keystone is different: given the input assembly, it returns a binary string. This applies for all architectures, with no difference.

trufae commented 3 weeks ago

Any update?

Rot127 commented 3 weeks ago

The problem with Hexagon (and other VLIW archs) is, it doesn't fit the assumptions Capstone makes about atomical execution units. For most architectures an atomical execution unit is one instruction. For VLIW architectures (Hexagon, E2K) this is no longer true.

In general we could try to add it with the current API, but is not a good idea. Because the API is not defined in a way that it could be used comfortably (let me know if I should elaborate).

@numas13 made some proposals for API changes in https://github.com/capstone-engine/capstone/pull/2367, https://github.com/capstone-engine/capstone/pull/2375 and https://github.com/capstone-engine/capstone/pull/2374. Those were specifically done to eventually support E2K, but @aquynh was against API changes and @numas13 lost interest.

In general a v2 API it is still on the TODO list. But there are many more things to do first (stabilize Auto-Sync, RISCV/SAIL, x86 update, 100% test coverage for details).

You can use our rz-hexagon generator. It also uses the LLVM definitions. It should be easy to generate the source for r2. Because it still generates for the RzAnalysis/RzAsm API.

I will change it though when we switch to RzArch (soon). And I think our implementation will differ a lot compared to the Radare2 one (I am guessing, since I haven not checked out yours).

But, it is definitely the fastest way for r2 to get Hexagon support. Espcially since forking is not too much work. The generator worked now fine over 3 years. Had to do almost no edits for the three or four updates I did in this time.

Rot127 commented 3 weeks ago

@trufae ah, btw. I reworked rz-hexagon completely. So pretty much nothing from the first implementation is left.