buserror / simavr

simavr is a lean, mean and hackable AVR simulator for linux & OSX
GNU General Public License v3.0
1.56k stars 365 forks source link

Support loading .obj files generated by assemblers #517

Closed AlexGuo1998 closed 11 months ago

AlexGuo1998 commented 11 months ago

If you want to source-debug an assembly program generated by avrasm2, the only way is to tell it to emit an .obj file. avra (Ro5bert/avra) is also able to generate this .obj file.

The file format is very simple, and contains both binary code and source mapping (kind of like elf). This file structure information below is summarized from avra source code and cross-checked with avrasm2.

File structure (big-endian for every fields. `04:06` means 2 bytes from offset 4) | File offset | Content | |--------------|---------| | 00:04 | End-of-code address (`eoca`) | | 04:08 | Signature (00 00 00 1A) | | 08:09 | How many bytes for one instruction entry (`entry_size`) | | 09:0A | Source file count | | 0A:1A | Comment (see below) | | 1A:`eoca` | Instructions, k * `entry_size` (see below) | | `eoca`:`EOF` | 0-terminated source filename array, ends with one more `'\0'` | There are 2 variants, possibly identified with `entry_size` and comment. In the default setting, `avrasm2` will choose v2 format only if there are source files with more than 65534 lines. `avra` can only produce v1 format. v1: * Comment is: `"AVR Object File\0"` * 9 bytes for 1 instruction (`entry_size` = 9) * Layout for each instruction entry: | Offset | Content | |--------|---------| | 00:03 | Word address | | 03:05 | Instruction (byte-swapped) | | 05:06 | Source file index | | 06:08 | 1-based source line number, 16-bit | | 08:09 | Is this generated by a macro expansion? | * Lines > 65534 get capped to 65535, and is not source-debuggable v2: * Comment is: `"AVR Obj File v2\0"` * 10 (0x0A) bytes for 1 instruction (`entry_size` = 10) * Layout for each instruction entry: | Offset | Content | |--------|---------| | 00:03 | Word address | | 03:05 | Instruction (byte-swapped) | | 05:06 | Source file index | | 06:09 | 1-based source line number, 24-bit | | 09:0A | Is this generated by a macro expansion? | * Theoretically supports source files up to 16777215 lines (untested)

I have a draft implementation at https://github.com/AlexGuo1998/simavr/commit/345bce8577acb4631629044a39f5fba037963aff

However, only binary loading is implemented. I'm unsure how to load this source mapping. Is it needed to hack the elf parser?

I'd like to polish it and file a pull request if you like. Let me know if there are any ideas or suggestions.

gatk555 commented 11 months ago

It ought to be fairly straightforward. The information should be placed in the data_names and codeline structure members defined in sim_avr.h, similarly to code in sim_elf.c and sim_dwarf.c. Those files call the matching libraries to extract address/length/name information and initialise those array members.

But this will not get you source debugging, at least not with avr-gdb. When using gdb with an ELF file, almost all the symbolic information is found by gdb reading the file itself. Only one obscure command, "info registers" uses the information inside simavr, which is mainly there to support the tracing feature.

Perhaps you may need to consider converting these .obj files to ELF, or modifying the assembler to output ELF. If so, I think libbfd may be your friend. But the AVRA page says symbolic debugging works in the IDE, so something more must be happening.

AlexGuo1998 commented 11 months ago

When using gdb with an ELF file, almost all the symbolic information is found by gdb reading the file itself.

Thanks for pointing out this. Makes sense.

But the AVRA page says symbolic debugging works in the IDE, so something more must be happening.

My guess is that Atmel Studio don't use avr-gdb (at least not for symbol parsing) when debugging assembly projects. I debugged with Atmel Studio and the "AVR Dragon" JTAG debugger some years ago. It didn't use any elf files.

After all, I think the only ways to source debug are:

  1. Convert .obj to elf, possibly writing a converter, and debug with avr-gdb.
  2. Write a custom debugger frontend that is able to parse .obj, and connect to simavr with the gdb serial protocol.

(BTW I do not intend to use gdb directly, but write a debug adapter for VSCode instead. So I'll possibly go with route 2.)

I'm closing this because being able to load .obj files is not useful for anyone else wanting to source debug these files, at least not without other heavy modifications.

gatk555 commented 11 months ago

This page suggests that Atmel (now renamed Microchip) Studio is using gdb. It may be worth finding out how it is called (command arguments) and what commands it is sent, perhaps by enabling logging in ~/.gdbinit.