eliben / pyelftools

Parsing ELF and DWARF in Python
Other
2.02k stars 511 forks source link

More register names #488

Closed sevaa closed 1 year ago

sevaa commented 1 year ago

For now just the 32-bit ARM and MIPS (the latter might be incomplete). The ARM mapping is hairy, with several reserved gaps and numerous kernel-level registers, but the potentially relevant Neon (SIMD) registers D0..D31 are towards the end of the list - to include those, it made no sense to not include the middle of the list also.

Any other architectures we might want to get in there? PowerPC, RISC-V, LoongArch maybe? Android is, supposedly, coming to RISC-V.

sevaa commented 1 year ago

Tried to used it in describe_reg_name, but our flavor of readelf, apparently, doesn't know about friendly register names on ARM/MIPS. So this is kind of pointless, now that I think of it. If we get more binaries and update readelf, this might become relevant - but for now it's not.

eliben commented 1 year ago

Tried to used it in describe_reg_name, but our flavor of readelf, apparently, doesn't know about friendly register names on ARM/MIPS. So this is kind of pointless, now that I think of it. If we get more binaries and update readelf, this might become relevant - but for now it's not.

Understood, thanks.

Sounds maybe like another reason it's time to bump readelf again? You did it last year to 2.38, it wasn't too horrible IIRC; do you think it's more challenging with the newer versions?

sevaa commented 1 year ago

I could try. What OS do you take your readelf from usually? My Debian is conservative, their copy of readelf is on 2.31.

eliben commented 1 year ago

I usually build it from source from the latest branch in binutils; the instructions are in test/external_tools/README.txt

Distributions are mostly conservative and carry older versions. Our current bundled version is 2.38, and it seems like the latest stable binutils is 2.41

sevaa commented 1 year ago

Okay, let me try and see what breaks.

OBTW, the real motivation for this PR is, I have a piece of logic for decoding the DWARF register numbers to friendly names in DWEX, and I import the array of Intel and ARM64 register names from pyelftools, but the ARM32 names aren't there. That's not sufficiently neat for my taste :) MIPS is kind of an afterthought even from my own standpoint, now that Android had all but ditched it.

sevaa commented 1 year ago

So, readelf 2.41. They've changed the hex number format in some more places to what readelf.py calls "alternative" - prefixed with 0x only when nonzero. Once past that, I've found out that they've redone the rnglists section dump, but in a way that makes sense - with in-section CU headers dumped. On that for now.

eliben commented 1 year ago

It may be worth creating a separate issue for the readelf update, where all the relevant details can be captured?

sevaa commented 1 year ago

There will be a separate PR eventually.

sevaa commented 1 year ago

Great, their ranges section(s) dump is buggy. Same bug in the current master - https://sourceware.org/bugzilla/show_bug.cgi?id=30781

@eliben - how do you want to proceed? Go out of the way to reproduce the buggy behavior, or wait and hope? Another option - exclude ranges from the autotest until readelf gets their act together. I suspect they have a bug of a similar nature on loclists also.

EDIT: my current copy passes the readelf autotest with the Ranges option commented out... :)

eliben commented 1 year ago

I guess it's really a matter of how terrible the hack would be. If it's only for a single binary we can maybe work around it temporarily until this is fixed upstream

sevaa commented 1 year ago

It's defintely not a single binary. At least two.

The hack would involve adding an explicit "readelf compatible" optional, false by default parameter to a couple of core class methods. Or rudely reaching into the internals of said classes from readelf.py.

I also want to see how do they react to the bug report.

sevaa commented 1 year ago

Anyway... I am not sure what to do with this anymore. :) On one hand, it's unfair that for some register sets, assembler-level friendly names are exposed but for some they are not. On the other, that way seems to be the behavior of readelf - it doesn't even recognize that R14/R15 on ARM32 are LR/PC, and the raison d'etre of descriptions.py is more or less mimicking readelf. Another DWARF visualizer consumer app, one that isn't bound to the idiosyncrasies of readelf, might think otherwise. That said, the register name sets were not meant to be a part of the API (note the leading underscore), even though they aren't package-private the way a less Pythonic language would enforce.

Is this PR is rejected, I won't be emotionally distraught. :)

eliben commented 1 year ago

Anyway... I am not sure what to do with this anymore. :) On one hand, it's unfair that for some register sets, assembler-level friendly names are exposed but for some they are not. On the other, that way seems to be the behavior of readelf - it doesn't even recognize that R14/R15 on ARM32 are LR/PC, and the raison d'etre of descriptions.py is more or less mimicking readelf. Another DWARF visualizer consumer app, one that isn't bound to the idiosyncrasies of readelf, might think otherwise. That said, the register name sets were not meant to be a part of the API (note the leading underscore), even though they aren't package-private the way a less Pythonic language would enforce.

Is this PR is rejected, I won't be emotionally distraught. :)

Not sure I understand the current state of things well enough - these values still appear to be unused anywhere; you're saying that readelf isn't using them, but can the API expose them somehow? If not, I don't see a reason to commit unused "private" stuff.

sevaa commented 1 year ago

Readelf is not using those. At least on ARM32 and MIPS, it explicitly doesn't. When a register reference is to be dumped (e. g. as a part of a location expression), readelf has a conditional logic depending on whether a friendly name exists - either something like r0 (eax), or just r0. For ARM32 and MIPS, this logic works as if no friendly names exist. We have binaries that expose that scenario.

I was thinking those enums would serve as a API for explaining registers in those apps where readelf compatibility is not needed. API functions don't expose them, but nothing keeps consumers from importing enums directly.

eliben commented 1 year ago

Right, but these are private names (leading _), right?

sevaa commented 1 year ago

Right :(

eliben commented 1 year ago

In general, descriptions.py was originally designed just for readelf's purpose. It's not 100% consistent across pyelftools, of course. Some things are parsed directly into enums on the structs level. Would exposing these in https://github.com/eliben/pyelftools/blob/master/elftools/dwarf/enums.py somehow make sense? I'm not sure...

sevaa commented 1 year ago

If you are not sure, go ahead and kill it. It's not a hill I'm willing to die on.

eliben commented 1 year ago

I'll close the PR for now. We can always revive it in case readelf starts reporting these and we want to align our output