davea42 / libdwarf-code

Contains source for libdwarf, a library for reading DWARF2 and later DWARF. Contains source to create dwarfdump, a program which prints DWARF2 and later DWARF in readable format. Has a very limited DWARF writer set of functions in libdwarfp (producer library). Builds using GNU configure, meson, or cmake.
Other
173 stars 70 forks source link

Crash in dwarfdump loclist parsing found via fuzzing #266

Closed core-explorer closed 2 weeks ago

core-explorer commented 3 weeks ago

I built libdwarf-code with the afl++ fuzzer as compiler and used a set of minimal C files compiled with gcc and clang as seed inputs. My version is origin/main (459c9153, 10 commits after v0.9.2)

This is the backtrace:

lldb -- ./dwarfdump --check-loc elf.dbg 
(lldb) target create "./dwarfdump"
Current executable set to '/target/x86_64-Debug/libdwarf-code/src/bin/dwarfdump/dwarfdump' (x86_64).
(lldb) settings set -- target.run-args  "--check-loc" "elf.dbg"
(lldb) r
Process 1447488 launched: '/target/x86_64-Debug/libdwarf-code/src/bin/dwarfdump/dwarfdump' (x86_64)
Process 1447488 stopped
* thread #1, name = 'dwarfdump', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x555455699c64)
    frame #0: 0x00005555555e17dd dwarfdump`read_single_lle_entry(dbg=0x00005555556915d0, data="", dataoffset=18446744069414584356, enddata="", address_size=8, bytes_count_out=0x00007fffffffbb4c, entry_kind=0x00007fffffffbb48, entry_operand1=0x00007fffffffbb40, entry_operand2=0x00007fffffffbb38, opsblocksize=0x00007fffffffbb28, opsoffset=0x00007fffffffbb20, ops=0x00007fffffffbb18, error=0x00007fffffffd2b8) at dwarf_loclists.c:157:12
   154          return DW_DLV_ERROR;
   155      }
   156      startdata = data;
-> 157      code = *data;
   158      ++data;
   159      ++count;
   160      switch(code) {
(lldb) bt
* thread #1, name = 'dwarfdump', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x555455699c64)
  * frame #0: 0x00005555555e17dd dwarfdump`read_single_lle_entry(dbg=0x00005555556915d0, data="", dataoffset=18446744069414584356, enddata="", address_size=8, bytes_count_out=0x00007fffffffbb4c, entry_kind=0x00007fffffffbb48, entry_operand1=0x00007fffffffbb40, entry_operand2=0x00007fffffffbb38, opsblocksize=0x00007fffffffbb28, opsoffset=0x00007fffffffbb20, ops=0x00007fffffffbb18, error=0x00007fffffffd2b8) at dwarf_loclists.c:157:12
    frame #1: 0x00005555555e3fc0 dwarfdump`build_array_of_lle(dbg=0x00005555556915d0, rctx=0x00005555556b8ea0, error=0x00007fffffffd2b8) at dwarf_loclists.c:1040:15
    frame #2: 0x00005555555e4907 dwarfdump`_dwarf_loclists_fill_in_lle_head(dbg=0x00005555556915d0, attr=0x00005555556b8d40, llhead=0x00005555556b8ea0, error=0x00007fffffffd2b8) at dwarf_loclists.c:1249:11
    frame #3: 0x00005555555e0f80 dwarfdump`dwarf_get_loclist_c(attr=0x00005555556b8d40, ll_header_out=0x00007fffffffbec0, listlen_out=0x00007fffffffbec8, error=0x00007fffffffd2b8) at dwarf_loc.c:1716:17
    frame #4: 0x000055555557e37f dwarfdump`print_location_list(dbg=0x00005555556915d0, die=0x00005555556ba510, attr=0x00005555556b8d40, checking=1, die_indent_level=2, no_end_newline=0, details=0x00007fffffffc920, llerr=0x00007fffffffd2b8) at print_die.c:6779:16
    frame #5: 0x0000555555578e4d dwarfdump`print_location_description(dbg=0x00005555556915d0, attrib=0x00005555556b8d40, die=0x00005555556ba510, checking=1, attr=2, die_indent_level=2, base=0x00007fffffffc940, details=0x00007fffffffc920, err=0x00007fffffffd2b8) at print_die.c:4401:16
    frame #6: 0x000055555557a214 dwarfdump`print_attribute(dbg=0x00005555556915d0, die=0x00005555556ba510, dieprint_cu_goffset=146, attr=2, attr_in=0x00005555556b8d40, print_else_name_match=1, die_indent_level=2, srcfiles=0x0000555555699ca0, srcfiles_cnt=9, lohipc=0x00007fffffffcb70, attr_duplication=0x00007fffffffcb48, err=0x00007fffffffd2b8) at print_die.c:4930:19
    frame #7: 0x0000555555575958 dwarfdump`print_one_die(dbg=0x00005555556915d0, die=0x00005555556ba510, dieprint_cu_goffset=146, print_else_name_match=1, die_indent_level=2, srcfiles=0x0000555555699ca0, srcfcnt=9, an_attr_matched_io=0x00007fffffffcc74, ignore_die_stack=0, err=0x00007fffffffd2b8) at print_die.c:2866:25
    frame #8: 0x0000555555573111 dwarfdump`dd_print_die_and_die_stack(dbg=0x00005555556915d0, in_die=0x00005555556ba510, dieprint_cu_goffset=146, srcfiles=0x0000555555699ca0, srcfilescount=9, err=0x00007fffffffd2b8) at print_die.c:1781:13
    frame #9: 0x00005555555738be dwarfdump`print_die_and_children_internal(dbg=0x00005555556915d0, in_die_in=0x00005555556ba510, dieprint_cu_goffset=146, is_info=1, srcfiles=0x0000555555699ca0, srcfilescount=9, sibling_off_array=0x0000000000000000, sibling_off_count=0, err=0x00007fffffffd2b8) at print_die.c:2047:15
    frame #10: 0x0000555555573b42 dwarfdump`print_die_and_children_internal(dbg=0x00005555556915d0, in_die_in=0x0000555555699c10, dieprint_cu_goffset=146, is_info=1, srcfiles=0x0000555555699ca0, srcfilescount=9, sibling_off_array=0x0000000000000000, sibling_off_count=0, err=0x00007fffffffd2b8) at print_die.c:2113:28
    frame #11: 0x0000555555573b42 dwarfdump`print_die_and_children_internal(dbg=0x00005555556915d0, in_die_in=0x000055555568e310, dieprint_cu_goffset=146, is_info=1, srcfiles=0x0000555555699ca0, srcfilescount=9, sibling_off_array=0x0000000000000000, sibling_off_count=0, err=0x00007fffffffd2b8) at print_die.c:2113:28
    frame #12: 0x00005555555719ef dwarfdump`print_die_and_children(dbg=0x00005555556915d0, in_die_in=0x000055555568e310, dieprint_cu_goffset=146, is_info=1, srcfiles=0x0000555555699ca0, srcfiles_count=9, err=0x00007fffffffd2b8) at print_die.c:1031:11
    frame #13: 0x00005555555725b3 dwarfdump`print_one_die_section(dbg=0x00005555556915d0, is_info=1, pod_err=0x00007fffffffd2b8) at print_die.c:1411:28
    frame #14: 0x0000555555570b6a dwarfdump`print_infos(dbg=0x00005555556915d0, is_info=1, pi_err=0x00007fffffffd2b8) at print_die.c:636:12
    frame #15: 0x000055555555f790 dwarfdump`process_one_file(file_name="elf.dbg", tied_file_name="", temp_path_buf="elf.dbg", temp_path_buf_len=223, l_config_file_data=0x0000555555679de0) at dwarfdump.c:1240:15
    frame #16: 0x000055555555e4c4 dwarfdump`main(argc=3, argv=0x00007fffffffd678) at dwarfdump.c:609:9

I have not investigated further.

The input file is attached. I can share my fuzzing setup if there is interest. elf.dbg.gz

davea42 commented 3 weeks ago

Valgrind finds no problems with dwarfdump --check-loc elf.dbg

dwarfdump says: /tmp/dwarfdump ERROR: ERROR: dwarf_get_loclist_c fails: DW_DLE_LOCLISTS_ERROR: An lle entry begins past the end of its allowed space. Corrupt DWARF.. Attempting to continue.

CU Name = (indexed string: 0x00000001)main.cpp CU Producer = (indexed string: 0x00000000)Ubuntu clang version 18.1.3 (1ubuntu1) DIE OFF = 0x00000072 GOFF = 0x000000ec, Low PC = 0x00001140, High PC = 0x00001157

/tmp/dwarfdump ERROR: ERROR: Cannot get location list data: DW_DLE_LOCLISTS_ERROR: An lle entry begins past the end of its allowed space. Corrupt DWARF.. Attempting to continue.

CU Name = (indexed string: 0x00000001)main.cpp CU Producer = (indexed string: 0x00000000)Ubuntu clang version 18.1.3 (1ubuntu1) DIE OFF = 0x00000072 GOFF = 0x000000ec, Low PC = 0x00001140, High PC = 0x00001157

/tmp/dwarfdump ERROR: Cannot get location data, attr (with -M also form) follow: DW_DLE_LOCLISTS_ERROR: An lle entry begins past the end of its allowed space. Corrupt DWARF.. Attempting to continue.

CU Name = (indexed string: 0x00000001)main.cpp CU Producer = (indexed string: 0x00000000)Ubuntu clang version 18.1.3 (1ubuntu1) DIE OFF = 0x00000072 GOFF = 0x000000ec, Low PC = 0x00001140, High PC = 0x00001157

There were 3 DWARF errors reported: see ERROR above.

In other words, it's a fuzzed binary but dwarfdump behaves normally.

I do not have lldb available at this time.

The code in the current version reads:

    179     if (data >= enddata) {
    180         _dwarf_error_string(dbg,error,DW_DLE_LOCLISTS_ERROR,
    181             "DW_DLE_LOCLISTS_ERROR: "
    182             "An lle entry begins past the end of "
    183             "its allowed space. Corrupt DWARF.");
    184         return DW_DLV_ERROR;
    185     }
    186     startdata = data;
    187     code = *data;
    188     ++data;
    189     ++count;
    190     switch(code) {

so I presume you are not testing a current dwarfdump???

core-explorer commented 3 weeks ago

Thank you for your swift analysis, you are correct, I was not using the latest version. I have now updated and the issue persists. This is what I see:

code = *data; in read_single_lle_entry() line 187

causes a segmentation fault because the value of data is not within any mapped memory region. I used reverse debugging to track the origin of this value:

There is a validity check ensuring data < end immediately preceding line 187.

data is passed in from build_array_of_lle () in line 1142. It is computed as data = rctx->ll_llepointer; in line 1121

ll_llepointer is computed as

llhead->ll_llepointer = lle_global_offset + dbg->de_debug_loclists.dss_data;

in _dwarf_loclists_full_in_lle_head() in line 1378.

And lle_global_offset is an attacker-controlled value read a couple lines earlier.

A suitably chosen value for lle_global_offset will cause integer overflow and cause data < end

I cannot reproduce the crash when running via valgrind, I presume that is because valgrind maps memory at different addresses and does not trigger integer overflow in the address calculation.

davea42 commented 3 weeks ago

After cleaning up some stuff for a full retest I now find that -fsanitized finds a bug just as you reported:

> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==2917684==ERROR: AddressSanitizer: SEGV on unknown address 0x505f00000524 (pc 0x56e55efed17b bp 0x7ffc3c2ef420 sp 0x7ffc3c2ef0b0 T0)
> ==2917684==The signal is caused by a READ memory access.
>     #0 0x56e55efed17b in read_single_lle_entry ../../../../home/davea/dwarf/code/src/lib/libdwarf/dwarf_loclists.c:187
>     #1 0x56e55eff74b8 in build_array_of_lle ../../../../home/davea/dwarf/code/src/lib/libdwarf/dwarf_loclists.c:1142
>     > AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==2917684==ERROR: AddressSanitizer: SEGV on unknown address 0x505f00000524 (pc 0x56e55efed17b bp 0x7ffc3c2ef420 sp 0x7ffc3c2ef0b0 T0)
> ==2917684==The signal is caused by a READ memory access.
>     #0 0x56e55efed17b in read_single_lle_entry ../../../../home/davea/dwarf/code/src/lib/libdwarf/dwarf_loclists.c:187
>     #1 0x56e5#2 0x56e55eff74b8 in _dwarf_loclists_fill_in_lle_head ../../../../home/davea/dwarf/code/src/lib/libdwarf/dwarf_loclists.c:1381
>     #3 0x56e55efdc767 in dwarf_get_loclist_c ../../../../home/davea/dwarf/code/src/lib/libdwarf/dwarf_loc.c:1719
>     #4 0x56e55ee96e57 in print_location_list ../../../../home/davea/dwarf/code/src/bin/dwarfdump/print_die.c:6659
>     #5 0x56e55eea59d3 in print_location_description ../../../../home/davea/dwarf/code/src/bin/dwarfdump/print_die.c:4482
>     #6 0x56e55eea59d3 in print_attribute ../../../../home/davea/dwarf/code/src/bin/dwarfdump/print_die.c:5031
>     #7 0x56e55eeaef24 in print_one_die ../../../../home/davea/dwarf/code/src/bin/dwarfdump/pr

Why this failed to reproduce earlier is a mystery. I don't like mysteries like this.

davea42 commented 3 weeks ago

Yes, in dwarf_loclists_fill_in_lle_head() we read in a loclistx value and fail to check it for sanity.. Thank you for finding this.

core-explorer commented 3 weeks ago

I found this by accident. The fuzzer I used is also used by the ossfuzz project for libdwarf-code. I had a look at libdwarf-binary-samples, and I don't think that is a good set of binaries to fuzz libdwarf.

davea42 commented 2 weeks ago

Very interesting observations. I will be thinking about this and about what might be done. As you observe, all the test cases and the fuzzing harness was created by google folks. I did not think of them as 'for beginners' (though I do think that's apt) but as 'how to violate the rules of using the library without literally violating them.' and it was quite effective. I did fix some of the example source as passing random pointers (uninitialized fields) to the library would make the test results not-reliably-reproducible.

It will be a few days before I can do anything about the clear bug you found, there is a confusing issue related to the non-standard DWARF5 GNU split-dwarf extension that I need to work on. It may be a few (more) days before I get back to this issue. But I will. ... lets leave this open.

Thank you for your work on this.

davea42 commented 2 weeks ago

FYI

I have found four places where a value later used is not checked for sanity. between dwarf_loclists.c and dwarf_rnglists.c READ_UNALIGNED_CK() macro is fine, but inot all have ht proper test for sanity following (some cases like version and address_size are checked later)

core-explorer commented 2 weeks ago

I meant that the C source files in binary-samples-v2/src are taken from https://github.com/DarrenRainey/C-Examples which literally claims "Example code written in C for beginners". These examples do not include a linked list.

I've been parsing the debug information for glibc internals, and that is very different C source code (plus the occasional assembly). But maybe that doesn't actually matter for libdwarf, its just more tags and attributes. You probably want to cover recursive and nested data structures as well as function inlining, and lots of references between things.

I would want to see a lambda expression in the C++ source, that language is full of opportunities where the compiler needs to do a lot of magic that needs to be captured in the debug information.

davea42 commented 2 weeks ago

I just pushed updates with four new checks for bad input read from disk (by libdwarf). rangelists and loclists.

Now your testcase generates an error , as we would hope.

I believe this is now fixed.

davea42 commented 2 weeks ago

Pushed the fix, so closing this.