NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.56k stars 5.87k forks source link

Wrong variables sizes for AVR8 #2689

Open marsfan opened 3 years ago

marsfan commented 3 years ago

Describe the bug The size of some of the standard variable types in Ghidra are incorrect for AVR8 Micro controllers (such as the ATMega328 used in the Arduino Uno). On AVR, the variable type int is 2 bytes in size, but Ghidra uses a 4 byte int. This makes assigning variable types difficult, as they will not fit properly.

To Reproduce Steps to reproduce the behavior:

  1. Import an AVR8 binary into Ghidra (use the basic avr8 processor type, with the gcc compiler)
  2. Open an AVR8 binary in Ghidra
  3. Open the Data Type Manager
  4. Filter for data type int
  5. Mouse over the data type of int within the BuiltInTypes folder to view the size of the int.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Attachments If applicable, please attach any files that caused problems or log files generated by the software.

Environment (please complete the following information):

lennoxgay commented 3 years ago

Cool

On 27 Jan 2021, at 01:12, Gabe R. <notifications@github.com mailto:notifications@github.com> wrote:

Describe the bug The size of some of the standard variable types in Ghidra are incorrect for AVR8 Micro controllers (such as the ATMega328 used in the Arduino Uno). On AVR, the variable type int is 2 bytes in size, but Ghidra uses a 4 byte int. This makes assigning variable types difficult, as they will not fit properly.

To Reproduce Steps to reproduce the behavior:

Import an AVR8 binary into Ghidra (use the basic avr8 processor type, with the gcc compiler) Open an AVR8 binary in Ghidra Open the Data Type Manager Filter for data type int Mouse over the data type of int within the BuiltInTypes folder to view the size of the int. Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Attachments If applicable, please attach any files that caused problems or log files generated by the software.

Environment (please complete the following information):

OS: Windows 10 x64 Java Version: AdoptOpenJDK 11.0.9.1 Ghidra Version: 9.2.1 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NationalSecurityAgency/ghidra/issues/2689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWV54SJWALY2H2WKN5ZAGDS35R6NANCNFSM4WUMSGHA.

GhidorahRex commented 3 years ago

I'll look into this. The avr8gcc cspec has ints defined as 2 bytes rather than 4. So Ghidra should be picking that up.

mytechnotalent commented 1 year ago

Any update on this? I am also trying to RE an atmega328p hex bin and the assmebler does not sync.

emteere commented 1 year ago

Not sure what you mean by the hex bin and the assembler does not sync. I suspect it isn't the above issue as the data organizations appear correct in 10.2. They may have been incorrect in the prior versions, I didn't check that.

If you are seeing different behavior than I describe below, what version of ghidra and what exact processor spec are you using?

Currently the data type manager for .gdt files always shows a generic data type organization, this includes the built-in datatype manager. So they will look that way when you hover on them. Ones that have been actually used in a program should have the correct data organization as specified in the .cspec file.

If you were to drag one of the generic ones into an area of memory in a program to define data in memory, it will be translated to the programs data organization. I just tried it an an atmega binary and an int showing 4 bytes int the builtins when used in the program is sized at 2. When I hover over the int data type in the programs data type manager entry, which gets added when used, the size is 2.

That said, I agree this is confusing, and there are changes in progress.

When you use data types from a program from an archive, they get their correct data type size when actually used in a program. This usually works fine. However when sizeof or other calculations need the size of data types at parse time, this can cause issues. There are some changes in 10.2 to help with this that aren't in the GUI yet. You can parse using a script which supplies an initial processor and associated data organization to be used during parsing. The data types will still be in the "generic" form in the data type archive, but size dependencies during parsing shouldn't be an issue in most cases.

We are changing the archives to keep the processor and data organization that they were parsed with, and to display that organization when you hover. That isn't quite ready, but some work has been done to that end.

Also it is probably better to parse the header files for your particular platform and configuration for the best results. It is a bit easier in 10.2, with some sample scripts that show how one might parse the header files from given toolchain.

There is a new script in 10.2, CreateAVR8GDTArchiveScript.java, that will parse header files from a avr8 gcc toolchain. It uses the avr8:LE:16:atmega256 processor data organization. The script attempts to parse each individual header file for all avr8 variants in the toolchain. It was originally done on AVR_8_bit_GNU_Toolchain_3.6.2, so there may be newer header files.

The script also has some extra post parse processing on the defines to add equates for memory locations that are done with #define's. It doesn't handle all the special avr8isms in the header files like memory configuration, but it does pull out some useful ones.

The ATmega328P is one of the variants that is parsed. image

mytechnotalent commented 1 year ago

I was able to get it to read when I selected the atmega-2* as a processor. I was expecting atmega328p or something along those lines.

emteere commented 1 year ago

It sounds like your issue was not a data type problem, but a Hex Importer issue and choosing a processor with a matching compatible memory map. The heximporter will create overlays for blocks it finds that don't fit in memory because the blocks overlap with existing memory blocks. If that occurs you can create the correct memory block and move the bytes to that memory.

The processor could really have been called atmega and not atmega-256. In general, we try for a generic name and a common superset, like the atmega which has memory and basic registers in common across a large number of variants.

The information from the header files could be worked into multiple processor variants, which would be easier but would be a bit of work to create and update.

If you try the above mentioned script, I'd be curious how it worked for you.