eliben / pyelftools

Parsing ELF and DWARF in Python
Other
1.99k stars 508 forks source link

Not compatible to XC16 compiled ELF files #473

Closed yashagarwal-314 closed 1 year ago

yashagarwal-314 commented 1 year ago

I am trying to use the pyelftools to get the variable data type for the dspic33ck elf file and it doesn't work.

sevaa commented 1 year ago

Can you share one of the offending files? If they are sensitive, can you build a dummy project with the same toolchain and settings and, if the problem reproduces, share that?

Also, please share what are you doing exactly with pyelftools. Does the file not open, or something goes wrong later, during debug info parsing/navigation?

yashagarwal-314 commented 1 year ago

Hello,

thank you for your email, the file opens but later I cannot access the debug information, somehow I get an assertation error. I am attaching the elf file of my project, hope that helps and thank you for your help :)

I wish you a great day and looking forward to hearing from you soon. Freundliche grüße Yash Agarwal

On Thu, Jun 15, 2023 at 5:50 PM Seva Alekseyev @.***> wrote:

Can you share one of the offending files? If they are sensitive, can you build a dummy project with the same toolchain and settings and, if the problem reproduces, share that?

Also, please share what are you doing exactly with pyelftools. Does the file not open, or something goes wrong later, during debug info parsing/navigation?

— Reply to this email directly, view it on GitHub https://github.com/eliben/pyelftools/issues/473#issuecomment-1593326041, or unsubscribe https://github.com/notifications/unsubscribe-auth/APD3QDCP5BFEGFZZJPPTRN3XLMVK7ANCNFSM6AAAAAAZERHBC4 . You are receiving this because you authored the thread.Message ID: @.***>

sevaa commented 1 year ago

The attachment didn't go through. Please navigate to the issue at Github and attach it there.

yashagarwal-314 commented 1 year ago

https://drive.google.com/file/d/1oTaSIvRCXfMlEzVjOPPxdk4UA4pS4Bm_/view?usp=drive_link

Hello @sevaa,

Thank you for your message, I have uploaded it on G-Drive and now you should be able to access it.

thank you for your support

sevaa commented 1 year ago

No immediate access. Requested.

sevaa commented 1 year ago

I see it now. The DWARF format seems broken. GNU readelf chokes on that file too:

readelf: Warning: Corrupt unit length (0x300ac) found in section .debug_info readelf: Warning: Corrupt unit length (0x300ac) found in section .debug_info

The bytes on the very top of the .debug_info don't look like a valid CU header. It goes:

uint32 unit_length: AC 00 03 00 uint16 version 00 00 uint32 abbrev_offset 00 00 02 00 uint8 address_size 00

Zero is not a valid value neither for version nor for address size.

yashagarwal-314 commented 1 year ago

Hey Sevaa,

unfortunately, I have no influence on the elf file, but this project is very important to me and It would be great if somehow you can guide me through a workaround, or you can make some changes in the library so the pyelftools works for this kind of files as well.

thank you!

sevaa commented 1 year ago

Is it possible that the file is intentionally obfuscated to prevent the kind of analysis you are trying to do?

yashagarwal-314 commented 1 year ago

Hey Sevaa,

Thank you for your message!

please let me know if you need any further information, thank you!

sevaa commented 1 year ago

If you have a version of readelf that can parse and dump the debug info, I suggest that you dump the dwarf info into a text file (use readelf -wi) and parse that. Should be enough for variable datatype recovery. You'll have to do some DIE ref chasing.

While it might be a fascinating project to figure out this flavor of DWARF, I don't think I can commit to that while not knowing the scope. Someone else might, but I don't see much enthusiasm here.

eliben commented 1 year ago

Closing this - parsing malformed DWARF that GNU tooling chokes on is not a task pyelftools is designed for

sevaa commented 1 year ago

Okay, some data points.

When the compiler vendors say theirs is a 16 bit machine, they take it seriously :) Looks like in this binary's flavor of DWARF, the standard [U]LEB128 integer encoding has been replaced with fixed width uint16 where possible.

My starting point was the abbrev table. Normally, one consists of mostly ULEB128 numbers - an abbreviation record contains the header with code, tag, and an uint8 has-children flag, followed by a set of (attribute, form) pairs until a null pair. Were the contents of the abbrev section in this binary be interpreted as all uint16's instead (even the has-children flag), the top of it looks like a sensibly looking abbrev:

Code 1
DW_TAG_compile_unit
Has children: yes
DW_AT_producer DW_FORM_string
DW_AT_language DW_FORM_data1 (but it's uint16 in the DIE anyway)
DW_AT_name DW_FORM_string
DW_AT_comp_dir DW_FORM_string
DW_AT_low_pc DW_FORM_addr
DW_AT_high_pc DW_FORM_addr
DW_AT_stmt_list DW_FORM_data4
0 0

Meanwhile in the info section, there is a UTF16 string starting with "GNU C..." at 0x18. Clearly the value of the producer attribute. Notably, it's preceded by uint16 0x1, which looks a lot like the abbrev code, followed by uint16 0x1 (DW_LANG_C89), then another UTF16 string that looks like a filename. Clearly DIE values.

EDIT: the CU header structure is unusual, too. The CU length is 8 bytes arranged as 4 uint16's. From the corpus that I'm seeing, the formula for length is word0*2+word1*512; words 2 and 3 are zeros throughout.


ELF Machine code is "Microchip Technology dsPIC30F", internally EM_DSPIC30F.


In conclusion, parsing this is definitely not a job for pyelftools :)

In theory, armed with this knowledge and with knowledge of DWARF proper, I could slap together an ad hoc parser that would dump the DIE tree for the OP. But I'm now wondering how far the OP is in their quest to recover the variable datatype from the readelf output, as I've suggested.

sevaa commented 10 months ago

@yashagarwal-314 see the latest on #518.


EDIT: all odd nonzero bytes in this issue's binary follow the same pattern. They all are 0x80 as the second byte in the logically 2 byte (physically 4) encoding of attribute DW_AT_language in the top DIE of the sources that are compiled with GNU AS. Ostensibly, the value of the attribute is 1, which stands for ANSI C, but AS is not a C compiler. I guess it's the assembler's way of marking its compile units. Or it could a bug in the way AS emits the DW_AT_language :) Anyway, introducing a special case handling just for that doesn't make a lot of sense. So barring other wrinkles, I think you can consider the monkeypatch from #518 workable for this issue's binary too.