buserror / simavr

simavr is a lean, mean and hackable AVR simulator for linux & OSX
GNU General Public License v3.0
1.55k stars 365 forks source link

Uninitialized dwarf_file in elf_firmware_t #522

Closed benedekt closed 5 months ago

benedekt commented 5 months ago

My setup is a fairly up-to-date Manjaro with gcc (GCC) 13.2.1 20230801, and I was not able to load ELF firmwares to a simavr host application. After some careful poking-around with a debugger on a compiled example I found that the dwarf loading fails in sim_elf.c since dwarf_file is not populated in the passed-in structure elf_firmware_t.

No matter how I try to initialize that struct in my client application I cannot get the field to be false, to go on this path in elf_read_firmware:

if (!firmware->dwarf_file)
        firmware->dwarf_file = strdup(file);    // Parse later.

I patched the function by forcing it to null just before the check, and it works now.

This is how I prepare my core:

elf_firmware_t f;
memset(&f, 0x00, sizeof(elf_firmware_t));

const char *fname = "../xxx.elf";

printf("firmware pathname is %s\n", fname);
elf_read_firmware(fname, &f);

printf("firmware %s f=%d mmcu=%s\n", fname, (int)f.frequency, f.mmcu);

Any tips on what's going on?

gatk555 commented 5 months ago

I think you have not provided enough information for anyone to comment usefully. Obvious questions: how did you build the firmware files and how did loading fail? Also the usual environmental information (OS version etc.) and the origin of the simavr build may be helpful.

benedekt commented 5 months ago

The firmware file is not important in my opinion. Any .elf loading with elf_read_firmware will fail done through the previous snippet, since the issue is on the simavr level not at the .elf creation. I also experience the same problem with the examples in the repo.

I actually used your fork, @gatk555 on an arch-based distro, and I fully expect that the problem has something to do with my setup. Simavr was built with stock default options (make all on a repository cloned from you, without any patches), and my host application is built like so:

gcc -ggdb -o test.out ctrl_sim.c -I/usr/local/include/simavr -lsimavr -lelf

The output test.out segfaults. I traced the problem with the debugger. It looks like that the f struct is uninitialized when passed to elf_firmware_read so in that function firmware->dwarf_file is not set to a correct path, just remains a bogus pointer, and that's while later the loading fails while trying to use garbage memory as a filename.

So how come that f is not initialized to zeroes? I also tried the old = {0}; trick but it didn't help either.

Just before call:

(gdb) list
29  
30      elf_firmware_t f = {0};
31  
32  
33      printf("loading AVR firmware: '%s'\n", firmware_elf);
34      elf_read_firmware(firmware_elf, &f);
35  
36      printf("detected MCU %s, F_CPU=%d\n", f.mmcu, (int)f.frequency);
37  
38      avr = avr_make_mcu_by_name(f.mmcu);
(gdb) print f
$1 = {mmcu = '\000' <repeats 63 times>, frequency = 0, vcc = 0, avcc = 0, aref = 0,
  tracename = '\000' <repeats 127 times>, traceperiod = 0, tracecount = 0, trace = {{kind = 0 '\000',
      mask = 0 '\000', addr = 0, name = '\000' <repeats 63 times>} <repeats 32 times>},
  command_register_addr = 0, console_register_addr = 0, flashbase = 0, flash = 0x0, flashsize = 0,
  datasize = 0, bsssize = 0, eeprom = 0x0, eesize = 0, fuse = 0x0, fusesize = 0, lockbits = 0x0,
  symbol = 0x0, symbolcount = 0, highest_data_symbol = 0, dwarf_file = 0x0}

Stepping through the reading function:

(gdb) s
elf_read_firmware (file=0x555555558004 "../hzCTRL/build/hzCTRL.elf", firmware=0x7fffffffca20)
    at sim/sim_elf.c:428
428 {
(gdb) list
423 
424 int
425 elf_read_firmware(
426     const char * file,
427     elf_firmware_t * firmware)
428 {
429     Elf32_Ehdr  elf_header;         /* ELF header */
430     Elf        *elf = NULL;         /* Our Elf pointer for libelf */
431     Elf32_Phdr *php;                /* Program header. */
432     Elf_Scn    *scn = NULL;         /* Section Descriptor */
(gdb) print *firmware
$1 = {mmcu = '\000' <repeats 63 times>, frequency = 0, vcc = 0, avcc = 0, aref = 0,
  tracename = '\000' <repeats 127 times>, traceperiod = 0, tracecount = 0, trace = {{kind = 0 '\000',
      mask = 0 '\000', addr = 0, name = '\000' <repeats 63 times>} <repeats 32 times>},
  external_state = {{port = 0 '\000', mask = 0 '\000', value = 0 '\000'}, {port = 0 '\000',
      mask = 0 '\000', value = 0 '\000'}, {port = 0 '\000', mask = 0 '\000', value = 0 '\000'}, {
      port = 0 '\000', mask = 0 '\000', value = 0 '\000'}, {port = 0 '\000', mask = 0 '\000',
      value = 0 '\000'}, {port = 0 '\000', mask = 0 '\000', value = 0 '\000'}, {port = 0 '\000',
      mask = 0 '\000', value = 0 '\000'}, {port = 0 '\000', mask = 0 '\000', value = 0 '\000'}},
  command_register_addr = 0, console_register_addr = 0, flashbase = 0, flash = 0x0, flashsize = 0,
  datasize = 0, bsssize = 0, eeprom = 0x0, eesize = 0, fuse = 0x0, fusesize = 0, lockbits = 0x0,
  symbol = 0x6e197b20c61b4300, symbolcount = 1, highest_data_symbol = 0,
  dwarf_file = 0x7ffff7cb9cd0 "\211\307\350\031\221\001"}

It looks like somehow just the end gets corrupted.

benedekt commented 5 months ago

I also verified that I have libelf, and the simavr build system knows about that - proven by an error pragma inside conditional section in the typedef for elf_firmware_t .

gatk555 commented 5 months ago

So all the included tests run OK, but a program you built yourself does not?

The gcc line for building your program has no library search path, so it is presumably getting libsimavr.so from some system directory, with a possibility of mis-matched versions. There is also a possibility of getting an unexpected version at run time, unless LD_LIBRARY path is used to force a particular file. I usually build simavr applications in the examples subtree with a cloned Makefile, because that takes care to avoid such events, embedding an explicit path to the current libsimavr.so in the binary.

Otherwise, the firmware build sequence may matter because this could be a case of a different method exposing a weakness in the elf/dwarf support. Do have libdwarf installed?

I am still in the dark about exactly how it fails (a gdb traceback would be helpful if it crashes) and how you worked around it:

"I patched the function by forcing it to null just before the check, and it works now.".

benedekt commented 5 months ago

The lib which is found by GCC is correct, since there are only two occurences of the libsimavr.so on the system, in my personal project path where it was built, and the global directory where GCC finds it.

I attached a minimal example with source, executable and coredump. mvce.zip The firmware was stolen from a built example application. The simulation file can be built as per the previous comment.

$ ./test.out 
firmware path: atmega48_ledramp.axf
Loaded 1836 Flash bytes at 0
Loaded 32 Initialised data bytes at 800100
firmware atmega48_ledramp.axf f=8000000 mmcu=atmega48
double free or corruption (out)
zsh: IOT instruction (core dumped)  ./test.out atmega48_ledramp.axf

The debugger shows that avr_load_firmware (avr=0x565099516f70, firmware=0x7fff08032970) at sim/sim_elf.c:165 failed, and you can trace it back to elf_read to find that the dwarf file path is not set, since it is not properly initialized.

Here is how I patch my instance as a temporary fix:

--- sim_elf.c   2024-02-07 12:42:40.941971314 -0500
+++ sim_elf_alt.c   2024-02-07 13:09:39.926290719 -0500
@@ -419,6 +419,7 @@
    const char * file,
    elf_firmware_t * firmware)
 {
+   memset(firmware, 0x00, sizeof(elf_firmware_t));
    Elf32_Ehdr  elf_header;         /* ELF header */
    Elf        *elf = NULL;         /* Our Elf pointer for libelf */
    Elf32_Phdr *php;                /* Program header. */
gatk555 commented 5 months ago

The explanation seems very likely to be in the gdb output that you showed above. Although they are the same data. "f" and "firmware" are not the same structure, as the second has a "external_state" structure missing in the first. It looks like "f" is the version from my fork, and firmware is using upstream. And because the sizes differ, the library code "sees" magical corruption at the tail of the structure.

So, yes. version skew. And those pesky pull-ups strike back from beyond the grave!

benedekt commented 5 months ago

Well, shit. Thank you, I will now purge all systemwide simavr components, and go with the repository cloned from you as a submodule. Thanks.