NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.27k stars 5.84k forks source link

AVR: Support for Atxmega256 #4333

Closed ghost closed 2 years ago

ghost commented 2 years ago

At the moment, I have been reverse engineering a firmware hex file that targets the Atxmega256.

Ghidra performs quite well for the most part, but the lack of specific support for the ATXMEGA256 vs. the 128 counterparts results in incomplete processing of the code section. There is also a not insignificant amount of data that remains unlabeled and not identified. While unrelated, lowering the string length profile to 3 characters can also be of interest in some cases.

Proper support for the 256 variants of Atxmega would actually make Ghidra a prime tool for revers engineering those platforms, as currently no other product supports them (wink), and Ghidra actually fares significantly better at the task, in my humble opinion.

Thank you in advance, and I can test any proposed changes before merging, if desired.

emteere commented 2 years ago

I've worked with the AVR8 chips to some extent. In general Ghidra tries to support the base processor versus all the specific variants for a particular processor, memory mapped registers, memory sizes, etc...

Can you give us a good run down on the main issues with using the current implementation that is giving you the most trouble?

In my general order of importance: -Are any instructions that we don't support? -Are there memory addressing modes in the atxmega256 that either aren't correct, or are enhanced beyond the other avr chips? Possibly a new page register? -Does the atxmega256 move around the basic registers for the chip? -Are there conventions that the current analysis isn't handling to recover references to code?

Do have a sample binary that exhibits things you'd like to see better supported that is legally available to the public? I've used printer firmware in the past.

I can go digging into the manual and differences, but it would help to get a summary from you since your head is currently in the issues. The problems you mention above could be caused by many things, not necessarily the variant of the chip.

ghost commented 2 years ago

I will try to give more details:

What is happening:

                             my_str_ref
   code:0001d5.1 ...        ds         "im a string without a reference"
                 ... 
                 ... 
                 ... 
                 ... 
                 ...

The actual MCU is Atxmega256a3.

Analysis seems to abruptly end here:

     code:0052a8 08 95           ret
                             ********** FUN_code_005299 Exit ********** 
                             LAB_code_0052a9                                 XREF[1]:     code:000568(j)  
     code:0052a9 f8 94           cli
                             LAB_code_0052aa                                 XREF[1]:     code:0052aa(j)  
     code:0052aa ff cf           rjmp       LAB_code_0052aa
                             DAT_code_0052ab                                 XREF[1]:     FUN_code_000549:...(R)  
     code:0052ab 01 02           undefined2 0201h
     code:0052ac 03              ??         03h
   code:0052ac.1 04              ??         04h
     code:0052ad 05              ??         05h
...

Program information (redacted):

Project File Name:  .hex
Last Modified:  Fri Jun 10 13:14:16 CEST 2022
Readonly:   false
Program Name:   .hex
Language ID:    avr8:LE:24:xmega (1.3)
Compiler ID:    gcc
Processor:  AVR8
Endian: Little
Address Size:   24
Minimum Address:    code:000000
Maximum Address:    mem:5fff
# of Bytes: 67336
# of Memory Blocks: 4
# of Instructions:  0
# of Defined Data:  0
# of Functions: 0
# of Symbols:   1239
# of Data Types:    0
# of Data Type Categories:  1
Analyzed:   false
Created With Ghidra Version:    10.1.4
Date Created:   Fri Jun 10 13:11:47 CEST 2022
Executable Format:  Intel Hex
Executable Location:    
Executable MD5: 
Executable SHA256:  
FSRL:   

Memory map:

image

ghost commented 2 years ago

Responding to the questions in order:

-Are any instructions that we don't support?

I think some registers might be out of spec for the currently available processor definitions, but opcodes should not be the culprit AFAIK.

-Are there memory addressing modes in the atxmega256 that either aren't correct, or are enhanced beyond the other avr chips?

Likely.

Here is the datasheet and some notes describing inter-model differences:

I don't have a wealth of experience with XMega series.

emteere commented 2 years ago

Initial possibility is the RAMPX,RAMPY registers are not current implemented correctly for addressing. That would affect memory access above 64K. These existed in prior variants and really should be implemented. However your memory map shouldn't suffer from this issue.

The code above looks correct. The code should not disassemble after the rjmp, as it jumps to itself, and there is a read of the bytes after it, unless it is a bad read.

There are many reasons a reference might not be made to a string, but I think the most likely is something that is being added to the next version. The issue is CODE memory is addressed as 2 bytes words, however when the ELPM command access it the value is a byte offset into CODE memory. So if the value is passed as a constant to a function, the value is byte oriented and must be shifted by 1 to get the actual address in CODE memory. Neither the constant reference analysis or the decompiler knows about this.

The good news is there is a new PointerTypdef that can do shifting on the value when the actual ELPM byte access isn't explict in pcode for the function. Unfortunately it will get the reference to the correct word, but the real access may be one byte off. What needs to be done in this case, is encode in the data type, that the offset is byte/word addressed. This hasn't been implemented yet. Also there is a little more work in progress for the automated constant referencing.

There is also currently an issue with the hex format importer for binaries that have and address space above 0xffff. So it is possible that when you loaded your .hex file things were over-written. You can try patching the IntelHexMemImage.java to see if there are bytes getting overwritten. It will put the bytes into an overlay space, and you can move them to the correct location. I discovered this with a .hex image I had.

diff --git a/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/IntelHexMemImage.java b/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/IntelHexMemImage.java
index 36459b6..c4f5d3c 100644
--- a/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/IntelHexMemImage.java
+++ b/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/IntelHexMemImage.java
@@ -96,6 +96,10 @@
                    }
                    break;
                case IntelHexRecord.END_OF_FILE_RECORD_TYPE:
+                   // if hit EOF record, everything after goes into an OTHER space
+                   // TODO maybe this should go into the data space?
+                   space = AddressSpace.OTHER_SPACE;
+                   base = space.getAddress(0);
                    // nothing to do, we're at the end (or should we ensure further parses fail?)
 //                log(line, "end of file");
                    break;
@@ -210,7 +214,8 @@
                myRangeMap.remove(range);
            }

-           String name = blockName == null ? base.getAddressSpace().getName() : blockName;
+           String name = blockName == null ? blockRange.getMinAddress().getAddressSpace().getName()
+                   : blockName;
            MemoryBlockUtils.createInitializedBlock(program, isOverlay, name,
                blockRange.getMinAddress(), new ByteArrayInputStream(data), data.length,
                "Generated by " + creator, progFile, true, !isOverlay, !isOverlay, log, monitor);

This shouldn't be a general issue with the ATXMEGA, and affects other avr version, like the xmega.

ghost commented 2 years ago

Some updates:

These are my modifications for the 256A3 with no changes to the opcodes and other internals (which is likely not what we want), which IDA's avr processor does not seem to honor (at least for the segments):

.ATxmega256A3
SUBARCH=106
RAM=16384
ROM=262144
EEPROM=4096

area DATA IOREG     0x0000:0x1000 IO Memory
area DATA EEPROM    0x1000:0x2000 EEPROM_4k
area DATA SRAM      0x2000:0x6000 SRAM_16k

Putting it here in case it helps to do cross-verification between tools.

I will test the provided diff and report back. I am also in the process of compiling a sample hex we can use to have a relatively repeatable base to work with. It will be one of the application notes sources.

ghost commented 2 years ago

Another quick update:

There is a wrong decode here:

image

Settings:

image

Does Ghidra have any capacity to process the bootloader for the Xmega or at least know where to map data as the bootloader does? It seems some of the issues are related also to how data is supposed to be moved into sram.

ghost commented 2 years ago

Another update: isolated HEX files as test cases.

It was mildly painful to put these together as I had to go spelunking for older IAR workbench distributions and compare things.

DES_hexfiles.zip

ghost commented 2 years ago

Update: I have made a separate project folder with only the 3DES example in use, and added markers like so: (Note: hex patch not applied)


#define DES_BLOCK_LENGTH  8
#define DES_BLOCK_COUNT   3

/*! \brief Plaintext block used by DES and 3DES.
 *
 *  \note  The MSB in the block is byte 0, and LSB is byte 7.
 */
uint8_t data[DES_BLOCK_LENGTH] = {0xAB, 0xBA, 0x00, 0xBE, 0xEF, 0x00, 0xDE, 0xAD};

/*! \brief Variable used to store DES and tripleDES results. */
uint8_t single_ans[DES_BLOCK_LENGTH];

/*! \brief Keys used by all DES operations. (single DES only uses the first 64-bit key).
 *
 *  \note  The MSB of the 3 keys is byte 0,8 and 16. The LSB of each key is byte 7, 15 and 23.
 */
uint8_t keys[DES_BLOCK_LENGTH * DES_BLOCK_COUNT] =
                    {0x94, 0x74, 0xB8, 0xE8, 0xC7, 0x3B, 0xCA, 0x7D,
                     0x28, 0x34, 0x76, 0xAB, 0x38, 0xCF, 0x37, 0xC2,
                     0xFE, 0x98, 0x6C, 0x38, 0x23, 0xFC, 0x2D, 0x23};

/*! \brief Main example doing DES encryption/decryption.
 */
int main( void )
{
        unsigned long marker = 0;

        /* Example of how to use 3DES encryption and decryption functions. */
        marker = 0xdefaceddUL;
        DES_3DES_Encrypt(data, single_ans, keys);
        marker = 0xdeadbeefUL;
        DES_3DES_Decrypt(single_ans, single_ans, keys);
        marker = 0xcafebabeUL;

    return 1234;
}

The project compiles with MPLAB (latest version), configured for the correct part number.

The disassembly skipping main():

image

Settings:

image

image

Attached both source and built hex files.

GccApplication1_hex.zip GccApplication1_src.zip

emteere commented 2 years ago

I'm seeing some of what you are seeing, but not all. Your main routine looks different, and possibly corrupt somehow? I'm not seeing the same addresses you are. The read reference above I believe is a bad reference from something, I don't get that, but I've futz with my code/procssor, so I'll have to isolate that. Both loaded binaries look OK for me. I'll isolate any changes to things that will work without code changes.

I have a modified AVR8.sinc/ldefs/pspec/cspec files which I'll post so you can try them out for me. And it will help me with the decompilation. Actually lucky you asked about this, it helps me out as I had been working on the AVR8 as a test binary for some of the new code.

There isn't any magic loader. The code in the sample you sent just has a loop loading from code memory into data, for some number of bytes. It looked pretty small, and you can just copy the bytes into the right place after making memory initialized.

If the references to code memory are passed as a parameter, then it will need to wait until I can get the change into the main branch. Plus to use the new stuff, you'll have to build ghidra from source code, which isn't as bad as it sounds.

emteere commented 2 years ago

Here is what I get from above GccApplication1_hex: This is with modification to the processor files

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_code_00041a()
             undefined         Wlo:1          <RETURN>
                             FUN_code_00041a                                                                                      XREF[1]:      InitMemory:000111(c)  
     code:00041a 40 e2                         ldi      R20,0x20
     code:00041b 50 e2                         ldi      R21,0x20
     code:00041c 60 e7                         ldi      R22,0x70
     code:00041d 70 e2                         ldi      R23,0x20
     code:00041e 88 e3                         ldi      Wlo,0x38
     code:00041f 90 e2                         ldi      Whi,0x20
     code:000420 0e 94 17 01                   call     DES_3DES_Encrypt                                                                                                                                                                                                                                undefined4 DES_3DES_Encrypt(unde
     code:000422 40 e2                         ldi      R20,0x20
     code:000423 50 e2                         ldi      R21,0x20
     code:000424 60 e7                         ldi      R22,0x70
     code:000425 70 e2                         ldi      R23,0x20
     code:000426 cb 01                         movw     W,R23R22
     code:000427 0e 94 6c 01                   call     DES_3DES_Decrypt                                                                                                                                                                                                                                undefined DES_3DES_Decrypt()
     code:000429 c8 e3                         ldi      Ylo,0x38
     code:00042a d0 e2                         ldi      Yhi,0x20
     code:00042b 00 e7                         ldi      R16,0x70
     code:00042c 10 e2                         ldi      R17,0x20
     code:00042d d8 01                         movw     X,R17R16
     code:00042e fe 01                         movw     Z,Y
                             LAB_code_00042f                                                                                      XREF[1]:      code:000436(j)  
     code:00042f 91 91                         ld       Whi,Z+=>data                                                                                                                                                                                                                                    = ??
     code:000430 8d 91                         ld       Wlo,X+=>single_ans                                                                                                                                                                                                                              = ??
     code:000431 98 13                         cpse     Whi,Wlo
     code:000432 43 c0                         rjmp     LAB_code_000476
     code:000433 80 e2                         ldi      Wlo,0x20
     code:000434 e0 34                         cpi      Zlo,0x40
     code:000435 f8 07                         cpc      Zhi,Wlo

And decompilation with some RE markup up of functions:

void FUN_code_00041a(undefined2 R19R18,char *R17R16)

{
  char *pcVar1;
  char *pcVar2;

  R21R20 = &Keys;
  R23R22 = &single_ans;
  W = &data;
  DES_3DES_Encrypt(&data,&single_ans,&Keys);
  R21R20 = &Keys;
  DES_3DES_Decrypt(&single_ans);
  Y = &data;
  R17R16 = &single_ans;
  X = &single_ans;
  Z = &data;
  do {
    pcVar2 = Z;
    pcVar1 = X;
    Z = Z + 1;
    W._1_1_ = *pcVar2;
    X = X + 1;
    W._0_1_ = *pcVar1;
    if (W._1_1_ != (char)W) goto code_c0x000477;
  } while ((byte)Z != 0x40 || Z._1_1_ != (char)(((byte)Z < 0x40) + ' '));
  R21R20 = &Keys;
  FUN_code_0001c1();
  R21R20 = &Keys;
  FUN_code_00024a(&single_ans);
emteere commented 2 years ago

Here are some changes you can try out that should help with the decompiler and constant reference markup. They should work in 10.1.4, and most will be checked into our main branch in the near future. They won't solve all the issues, but should help quite a bit, especially with the decompiler and constant references. They are prototype, and may change slightly as they are checked into the main branch soon.

They need to replace files in /Ghidra/Processors/Atmel/data/languages

I've added a prototype CODEBYTE space to allow the actual memory location to be tagged correctly with a reference, and constant memory values passed as parameters to go to the right place. The word offset reference to the CODE memory will still be created, but there will be a reference created to the CODEBYTE memory as well. You will need to create a BYTE OVERLAY memory block in the CODEBYTE address space to the CODE address space. The bytes will get duplicated, but the references and decompiler code will be much more readable. If you want to make a reference by hand that is byte based you can make it to the CODEBYTE space.

To make the CODE/CODEBYTE changes work the best you will need the main branch. However, I would not build from source in the main branch yet, and would just try just these patches/changes on 10.1.4 before complicating things with a main branch source build. A new PointerTypedef can be created to specify the correct address space for the decompiler. There are still some things to work out before the 10.2 release for constant references.

Also, you will need to either re-import, or re-analyze as follows.

There are some things you can do to improve the analysis in 10.1.4. The references to MEM are not occuring because by default parameter references are turned off. This is probably the "regression" you mention in 10.1.2->10.1.4. We turned this option off by default for harvard architectures. You can turn them back on in Analysis->BasicConstantReferenceAnalyzer->FunctionParameter/ReturnPointerAnalysis. Be aware this can cause bad references on 16-bit, but they should be to data memory. The will be tagged as PARAM type references

Also the mem data space should be sized for your processor. I added mem:2000-mem:5fff. There may also be external memory mapped after that location. Once those blocks are created, if you turn on the above analysis, and re-run auto-analysis the references should be created.

                             undefined FUN_code_00041a()
             undefined         Wlo:1          <RETURN>
     code:00041a 40 e2                         ldi      R20,0x20
     code:00041b 50 e2                         ldi      R21=>DAT_mem_2020,0x20                                                                                                                                                                                                                          = ??
     code:00041c 60 e7                         ldi      R22,0x70
     code:00041d 70 e2                         ldi      R23=>DAT_mem_2070,0x20                                                                                                                                                                                                                          = ??
     code:00041e 88 e3                         ldi      Wlo,0x38
     code:00041f 90 e2                         ldi      Whi=>DAT_mem_2038,0x20                                                                                                                                                                                                                          = ??
     code:000420 0e 94 17 01                   call     FUN_code_000117                                                                                                                                                                                                                                 undefined FUN_code_000117()

avr8xmega.pspec.txt avr8egcc.cspec.txt avr8.sinc.txt

ghost commented 2 years ago

Much appreciated, I apologize for the delay.

I will go through the modifications and report back.

Thank you for all the work and effort in triaging these.

ghost commented 2 years ago

@emteere

I had a chance to build a 10.1.4 branch with the attached modifications (your spec/processor files plus the hex one). I might be missing something as I do not get the same results in either disassembly or decompiler:

Screenshot from 2022-06-26 16-34-18

I don't see any overlay segments created from the hex patch but I might be looking at the wrong spot.

I will repeat tests with master branch and the same changes applied (if they do so gracefully).

additive_diff.patch.txt

ghost commented 2 years ago

I can confirm that function boundaries are not being detected well with the modified files. I will see if I can make a test case for this...

emteere commented 2 years ago

You will need to create the byte mapped memory blocks in the memoryMap by hand. We're adding the capability to the .pspec files. Currently we can only create the bitmapped overlay blocks, not bytemapped. Some references won't get created unless the memory blocks exist for the potential address.

ghost commented 2 years ago

I have put together a more comprehensive sample with also some SPI functionality.

I have done some preliminary work based off: https://github.com/cloakware-ctf/rhme3-writeups/tree/master/atxmega128a4u/scripts

This should yield similar results with a current/up to date IDA:

    def avr_loader_emu(source_start, target_start, target_end):
        ram_segment = None
        rom_segment = None
        for segment in sark.segments():
            print("segment %s ea %x" % (segment.name, segment.ea))
            if segment.name == 'RAM':
                ram_segment = segment
            elif segment.name == 'ROM':
                rom_segment = segment

        for offset in range(0, target_end-target_start, 2):
            rom_ea= rom_segment.ea + int((source_start + int(offset))/2)
            val =  idc.get_wide_word(rom_ea)
            print("patching ram addr %x with %x from rom %x" % (ram_segment.ea + target_start + offset, val, rom_ea))
            ida_bytes.patch_word(ram_segment.ea + target_start + offset, val)

        sark.Line(ram_segment.ea + target_start).comments.repeat = "DATA start"
        sark.Line(ram_segment.ea + target_end).comments.repeat = "DATA end"
        return

    def avr_bss_emu(target_start, target_end):
        ram_segment = None
        for segment in sark.segments():
            print("segment %s ea %x" % (segment.name, segment.ea))
            if segment.name == 'RAM':
                ram_segment = segment

        for offset in range(0, target_end-target_start, 2):
            print("zeroing %x (ram)" % (ram_segment.ea + target_start + offset))
            ida_bytes.patch_word(ram_segment.ea + target_start + offset, 0)

        sark.Line(ram_segment.ea + target_start).comments.repeat = "BSS start"
        sark.Line(ram_segment.ea + target_end).comments.repeat = "BSS end"
        return

This is the loading sequence for the hex (with stable 10.1.4):

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined USARTF1_TXC()
             undefined         Wlo:1          <RETURN>
                             USARTF1_TXC                                     XREF[1]:     Entry Point(*)  
     code:0000f8 df e5           ldi        Yhi,0x5f
     code:0000f9 de bf           out        SPH,Yhi
     code:0000fa 00 e0           ldi        R16,0x0
     code:0000fb 0c bf           out        EIND,R16
     code:0000fc 11 e2           ldi        R17=>DAT_code_002100,0x21
     code:0000fd a0 e0           ldi        Xlo,0x0
     code:0000fe b0 e2           ldi        Xhi,0x20
     code:0000ff e4 ef           ldi        Zlo,0xf4
     code:000100 f1 e1           ldi        Zhi,0x11
     code:000101 00 e0           ldi        R16,0x0
     code:000102 0b bf           out        RAMPZ,R16
     code:000103 02 c0           rjmp       LAB_code_000106
                             LAB_code_000104                                 XREF[1]:     code:000108(j)  
     code:000104 07 90           elpm       R0,Z+=>DAT_code_0008fa                           = 7494h
     code:000105 0d 92           st         X+=>DAT_mem_2000,R0
                             LAB_code_000106                                 XREF[1]:     code:000103(j)  
     code:000106 aa 3c           cpi        Xlo,0xca
     code:000107 b1 07           cpc        Xhi,R17
     code:000108 d9 f7           brbc       LAB_code_000104,Zflg
     code:000109 22 e2           ldi        R18,0x22
     code:00010a aa ec           ldi        Xlo,0xca
     code:00010b b1 e2           ldi        Xhi,0x21
     code:00010c 01 c0           rjmp       LAB_code_00010e
                             LAB_code_00010d                                 XREF[1]:     code:000110(j)  
     code:00010d 1d 92           st         X+=>DAT_mem_21ca,R1
                             LAB_code_00010e                                 XREF[1]:     code:00010c(j)  
     code:00010e a8 31           cpi        Xlo,0x18
     code:00010f b2 07           cpc        Xhi,R18
     code:000110 e1 f7           brbc       LAB_code_00010d,Zflg
     code:000111 0e 94 d8 04     call       FUN_code_0004d8                                  undefined FUN_code_0004d8()
     code:000113 0c 94 f8 08     jmp        LAB_code_0008f8
                             **************************************************************
                             *                       THUNK FUNCTION                       *
                             **************************************************************
                             thunk undefined BOOT()
                               Thunked-Function: BOOT
             undefined         Wlo:1          <RETURN>
                             BOOT                                            XREF[118]:   OSC_XOSCF:000002(T), 
                                                                                          OSC_XOSCF:000002(j), 
                                                                                          PORTC_INT0:000004(T), 
                                                                                          PORTC_INT0:000004(j), 
                                                                                          PORTC_INT1:000006(T), 
                                                                                          PORTC_INT1:000006(j), 
                                                                                          PORTR_INT0:000008(T), 
                                                                                          PORTR_INT0:000008(j), 
                                                                                          PORTR_INT1:00000a(T), 
                                                                                          PORTR_INT1:00000a(j), 
                                                                                          DMA_CH0:00000c(T), 
                                                                                          DMA_CH0:00000c(j), 
                                                                                          DMA_CH1:00000e(T), 
                                                                                          DMA_CH1:00000e(j), 
                                                                                          DMA_CH2:000010(T), 
                                                                                          DMA_CH2:000010(j), 
                                                                                          DMA_CH3:000012(T), 
                                                                                          DMA_CH3:000012(j), 
                                                                                          RTC_OVF:000014(T), 
                                                                                          RTC_OVF:000014(j), [more]
     code:000115 0c 94 00 00     jmp        0x0=>BOOT

As I understand it, the loader simply copies over the data starting at 0008fa to 0x2000. I'm looking for the source/compiler "juice" that introduces this loader/prologue function, just to wrap my head around it in case I can perhaps write a generalized script that can automatically take care of this (do tell if you have something already done, it will be much appreciated).

From the hex used with the original scripts I referenced:

image

GccApplication1.zip

ghost commented 2 years ago

A sidenote:

What is the status for signatures and AVR binaries? Is it possible to process an ELF with DWARF information to generate signature databases that Ghidra could use?

emteere commented 2 years ago

RE: processing DWARF/ELF. Funny you should mention that. Almost. I've been working on elf relocations for .o files for just this purpose. It should go in soon. I was just processing some ELF libraries and checking out the code. You would need to process them into FID database files. As long as the binary was compiled to use the same .a/.o as in the libraries you generate the FID files from it should probably work. YMMV.

ghost commented 2 years ago

I will give that a go as soon as it is out. A script that takes a directory and recursively consumes all the object files in that spot would be amazing to have, if it outputs directly to files. I could probably let it loose on a large archive of MPLAB + IAR versions... I also assume no copyright infringement is done by providing those files, since technically function hashes do not contain any IP. Maybe a lawyer sufficiently deprived of sugar would disagree, but I doubt a court would. So, I'm happy to help.

Some minds think alike. Most minds hate guessing libc functions.

BTW I had moderate success with the loader loops and overlays, but as you said, it's a compound problem. Lots of references are missed because of registers or other processor quirks not working properly. I also went spelunking into the avr-libc to find the libgcc.a function that is injected by the ldscript into the init4.

Patiently waiting for your surprises with the AVR fixes like a kid on Xmas eve.

emteere commented 2 years ago

There are some updates in patch and master for the larger than 16-bit avr8 memory.

I did try building a test FID file from the AVR8 GCC libraries. I was able to match routines into a 3d printer firmware image.

There will be bigger changes in the main development branch to handle the various memories and different pointer parameters to each type of memory. So for example if 0x1234 is passed to a function the decompiler and constant propagation does not know if that is a constant, sram, code byte, code word, or IO memory pointer.

ryanmkurtz commented 2 years ago

Closed by 1df11c6cb0e2cce67a14e6892a935e3544d80f1e

emteere commented 2 years ago

You can try using this script to automatically initialize the AVR8 RAM memory.

The code is very brittle if the reset vector code or do_copy_data code changes. Its not the cleanest code, nor does it handle all error cases. There are probably other ways of doing the same thing, but it should work.

Avr8LoadMemScript.java.txt

rbray89 commented 2 years ago

I'm seeing some similar behavior on an ATXMega256 as well... The strings in PROGMem aren't showing up as having any x-references. Any tips?

ghost commented 1 year ago

@rbray89 could you upload your hex/firmware file somewhere?

ghost commented 1 year ago

@emteere I'm about to test the latest release (10.2.3) as it seems to contain some bugfixes relevant to my work. Are you aware of any improvements related to the issues described here?

Also: AVR8 headers parsing seem to to be supported now, does this help with any of these issues here?

Also very important: what is the ideal method to leverage a eep/bin/hex eeprom file? Of course this is a memory mapped file situation, but I have not tested yet how it will be handled in practice by Ghidra.

ghost commented 1 year ago

@emteere I have finished a preliminary version of a tool to automatically process AVR8 toolchains and extract the libraries, processing the object files and creating FIDB files with headless Ghidra:

image

However, I do not see FID popping up as an option in the analyzer. Am I missing something?

Edit: this is what the current tool looks like:

$ ls -alh fidb/
total 232K
drwxrwxr-x 2 user user   8 feb 18 13:07 .
drwxrwxr-x 9 user user  11 feb 18 08:45 ..
-rw-r----- 1 user user 52K feb 18 13:01 avr8-avr25-7.3.0.fidb
-rw-r----- 1 user user 29K feb 18 13:02 avr8-avr31-7.3.0.fidb
-rw-r----- 1 user user 29K feb 18 13:03 avr8-avr35-7.3.0.fidb
-rw-r----- 1 user user 28K feb 18 13:02 avr8-avr3-7.3.0.fidb
-rw-r----- 1 user user 34K feb 18 13:04 avr8-avr4-7.3.0.fidb
-rw-r----- 1 user user 36K feb 18 13:08 avr8-avr51-7.3.0.fidb
-rw-r----- 1 user user 30K feb 18 13:07 avr8-avr5-7.3.0.fidb
-rw-r----- 1 user user 35K feb 18 13:08 avr8-avr6-7.3.0.fidb
-rw-r----- 1 user user 15K feb 18 13:09 avr8-avrtiny-7.3.0.fidb
(...)

$ ls -alh scripts/
total 34K
drwxrwxr-x 2 user user    4 feb 18 06:36 .
drwxrwxr-x 9 user user   11 feb 18 08:45 ..
-rw-rw-r-- 1 user user  12K feb 18 07:00 common.sh
-rwxr-xr-x 1 user user 8,7K feb 18 12:59 genfidbs.sh

$ ./scripts/genfidbs.sh -h
Usage:
     -h|--help                  Displays this help
     -v|--verbose               Displays verbose output
    -nc|--no-colour             Disables colour output
    -cr|--cron                  Run silently unless we encounter an error
   -src|--toolchain-basedir     Directory to process toolchains (DB takes the names from level 1)
    -gh|--ghidra-home           Ghidra home (defaults to GHIDRA_HOME env variable)
    -gp|--ghidra-projectdir     Ghidra projects (defaults to GHIDRA_PROJECTS env variable)

It's still going through the whole lot. It automatically detects the toolchain version and separately imports each variant/family, extracts every library (.a object), aggregates the symbols in the "common" text files, and finally processes each project to create a FID database.

There is room for improvement and it can easily be used to process other toolchains/sets of libraries.

I took some inspiration from: https://github.com/threatrack/ghidra-fid-generator/

I might add the ability to build databases for AVR drivers/examples, so that some functionality can also be easily detected when it does not divert too far from the official examples.