NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.37k stars 5.85k forks source link

RFC TriCore #476

Closed mumbel closed 5 years ago

mumbel commented 5 years ago

I have been working on a TriCore processor and have at least decent disassembly and semi-working decompiler. Thought I would push the effort so far to github. (Ghidra devs if you don't want this sort of thing here, I can understand and please close)

I basically took the tricore headers and source from binutils (similar approach to r2) and scripted out the generation of most of the sleigh. This provided the majority of all the disassembly, though I have found I am missing some instructions still from the manuals. Then went through and added the logic for a lot of the instructions (though i did TODO a lot if I wasn't sure on if I could macro large portions).

  1. E and P registers E and P are the 64bit pair equivalent for the D/A registers. Ex: the manual would say E2, but in disassembly that would be the D2/D3 pair. Currently I just use E2 and that may come down to personal preference for 64-bit operations, but sometimes it does use each of the pair individually and would be good to track that for the decompiler. Is there a way to get access to DN and DN+1 from EN?
  2. Function Arguments D4-D7 are the first 4 data arguments A4-A7 are the first 4 pointer arguments E/P pairs are allowed here I have not really looked into this at all for the decompiler short of a quick attempt. edit: Re-ordering the XML in the cspec has corrected this
  3. Processor versioning I need to work on the versioning in the spec files and create multiple slaspec files. In terms of manuals, I don't see how they align to the values used in the headers for sinc generation.
  4. Java Have not look at this one bit
  5. pcode How too much or too little needs to go into the logic for each instruction? Is super verbose stuff like (CALL and RET) just better left out or put into Java/xml maybe?
  6. upper/lower word loads Similar to MIPS, tricore uses movh.a and other intructions like ld.hu to get a 32-bit address
  7. define space for ram and registers loads and stores can reference ram or registers. Should I set high address space registers in spec or only define a single shared space?

Some issues I just have not looked into yet and I'm sure there are other things I'm forgetting to mention here, but any comments or suggestions welcome. (To Ghidra devs, at what stage of development would pull requests for something like this be appropriate? or is that a big TBD at this point)

src link pdf 1.3 & 1.3.1 pdf 1.6 test binaries

emteere commented 5 years ago

We are interested in new language definitions and would welcome pull requests for them.

Most of the processor modules we've included are fairly clean and tested in a variety of situations, although they all have their issues and shortcomings. All processor modules go through a review process just as code does. I'm not sure the stage it would be ready for a pull request.

There is a bit more pcode needed. It should work on large programs with the all instructions used by a compiler spec'ed in pcode. There may also need to be refactoring of operands into shared sub-constructors for shared operations like addressing modes. Then concentrating on instructions that are found in the binary under investigation. Pseudo-ops for the instructions like SIN() that are too complicated. It is at a basic state of completion (disassembly with flow) where someone will find it useful and carry on the work. Including it as a useful prototype is a possibility as long as it doesn't cause issues with the core system. Processor modules don't tend to on their own. If I were reviewing it, then it needs more work as you mention. Being a bit new to pull requests, there would need to be a balance of readiness versus an initial prototype. If a pull request is meant for an initial review starting very early, then yes we could accept it as a pull request. We'll need to discuss a bit what stage something should be in to start the pull request. IMHO it will likely depend on each individual pull request.

Your method of extracting some portion of the spec from binutils or any other source that can provide at least some automation in creating some portion of them is a good methodology when creating a new processor spec. It looks like you have many of the basics down.

  1. Definitely overlap them correctly in the register definitions. That is important in the model and will save lots of sub-piece operations in the pcode. Be careful of endianess.
  2. Right now there is no data defined allocation scheme for the parameters, although that could be done in JAVA code. You may need to use custom storage, or register allocation of parameters as in the 6502. There has been discussions of a more intelligent allocation scheme.
  3. The processor can be broken down into sub-versions as you have done, although unless there is some conflict in the various versions, instructions/memory/registers, the processor modules tend to have all the features included. This keeps the number of .SLA files down. Having the versions of a processor family spec'ed in some controllable way is always good. So if there isn't a conflict or a need to spec different addressing/memory/instructions, I'd just go with one .slaspec to start. Another strategy is to use the context to enable versions and then you just need to have separate .pspec's/ldef entries to control the version.
  4. You don't necessarily need java, unless there is relocations, ELF format differences, issues with analysis, etc... Which you most likely will to import an ELF or .o.
  5. Too much pcode can make the decompiler produce over-complicated or unhelpful output. All depends on the intent of the processor model, as sleigh is a processor model that can be made to emulate with enough pcode (and possibly supporting java code). Implicit things are better left out unless the decompiler can simplify, you need to emulate, or theorem prove on the spec. It's a balance.
  6. Leave that to the decompiler/symbolic propagation. The instructions do what they do. Let higher level analysis deal with it.
  7. It's a model so it depends. Does the processor share them as one space? Then make one space and define the registers in the .pspec, Except most likely the SP register or any other implicit register.
mumbel commented 5 years ago

@emteere Thanks for a taking a look.

  1. I did have incorrect register defines for the e/p looks like (wasn't thinking and put '_' in the register define and attach, where its just needed in attach). edit: Any suggestions on accessing the N+1. for example the instruction ld.dd E2 [A4] loads 16 bytes into into the 8 byte registers E2 and E4 (or also the 4 byte registers D2,D3,D4,D5) edit: added 2 more tokens for even/odd and then attached the even to evens and odd to odds
  2. Fixing 1 led to some problems with my <pentry> looks like, so hopefully I can sort that out too (for now I've removed e/p which might be the correct thing here anyways).
  3. I generated "TRICORE_RIDER_A" defined instructions, which seems to account for a significant portion of unimpl instructions at this point. I can't find a manual for this, so I think I'll move those to another sinc or remove entirely (since they're auto-generated, not a real loss). This should also reduce any version complexity since these instructions do conflict.
  4. Good to know. The only ELF loading issues so far is analysis errors out with a pop-up "No DWARF to Ghidra register mappings found for this program's language [tricore:LE:32:default], unable to import functions." Any suggestions there?
  5. I was thinking about making a define for verbosity and wrap some inside, its more obvious now that more things are decompiling whats causing too much output.
  6. Yeah, I didnt even notice at that point. The disassembly would still be high and low w/o a xref, but the decompiler was recognized it.
  7. Still on my TODO list, but thanks for reminding me that all of this is defined and a lot of these answers are somewhere in the manual.
ryanmkurtz commented 5 years ago

I'm going to go ahead and close out this question. The conversation is continuing in #567.