NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
52.08k stars 5.9k forks source link

PowerPC int to float conversions decompile poorly #2850

Open Pokechu22 opened 3 years ago

Pokechu22 commented 3 years ago

Describe the bug int to float or double conversions decompile weirdly with PowerPC.

To Reproduce Decompile the attached binaries, using PowerPC:BE:32:default:default as the language. Observe that (e.g.) int2float_O3_g.o's int2float and int2double look like this:

float int2float(int32_t i)

{
  return (float)((double)CONCAT44(0x43300000,i ^ 0x80000000) - 4503601774854144.0);
}

double int2double(int32_t i)

{
  return (double)CONCAT44(0x43300000,i ^ 0x80000000) - 4503601774854144.0;
}

Note that this works because 0x4330'0000'8000'0000 is 4503601774854144.

Expected behavior Something like this:

float int2float(int32_t i)

{
  return (float)i;
}

double int2double(int32_t i)

{
  return (double)i;
}

Attachments int2float.zip — contains .o and .elf files at -O0 through -O3 with and without -g compiled using powerpc-eabi-gcc provided via devkitPPC. (I release this code under CC0.)

Environment (please complete the following information):

Additional context The 64-bit versions (also included in the attachment) call out to glibc soft float implementations, which use similar trickery. float2int and double2int already generate casts normally.

Pokechu22 commented 3 years ago

Note that this conversion (which is described on §3.3.8.3 of The PowerPC Compiler Writer's Guide (page 83/PDF page 103)) involves writing two 32-bit values to the stack, and then reading them back as a double. It's odd that Ghidra seems to be treating that as a regular cast to double, when it should be reinterpreting the value.

It also looks like 64-bit implementations of PowerPC have the fcfid instruction which does this conversion, and ghidra does have an implementation of fcfid already; this doesn't help me since the code I'm analyzing targets a 32-bit PowerPC architecture.

nbouteme commented 3 years ago

I think you can fix this by implementing a callfixup extension, as referenced here: https://fossies.org/linux/ghidra/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerOptions.html#ExtensionOptions

Neui commented 1 year ago

I have created a rule to the decompiler that makes it a proper cast at my branch decompiler-pcc-int2float. It almost matches the expected output with the difference being that it still thinks int2float() returns a double (which seems to be a general issue however). Note that only the 32-bit (u)int to float is implemented.

I am not sure if I should send a PR just with that or not because it seems it is used in one architecture (32-bit PowerPC) and thus might be better in an arch-depended file like maybe that ("unused" and disabled) declarative experimental_rules feature. Also I am not sure if I should maybe check for types (int vs. uint, although output seems to be just OK). (And maybe style convention as I didn't really followed it compared to the other rules in the file).

Since the public Ghidra distribution ships the decompiler source files, you can try this out without building Ghidra from scratch (I did that during development):

  1. cd ghidra_10.3.2_PUBLIC/Ghidra/Features/Decompiler/src/decompile/cpp
  2. Download the patch
  3. patch -p7 -i /path/to/downloaded.diff
  4. make -j8 ghidra_opt
  5. killall decompile
  6. cp ghidra_opt ../../../os/linux_x86_64/decompile
  7. Try to trigger the decompiler twice in Ghidra (once to notice that the decompiler died and second time to actually start the decompiler with the new binary).
RootCubed commented 1 year ago

FWIW, I also made this kind of simplification rule some time ago, but it's a lot more messy than yours because I tried to make it as general as possible by not hardcoding the constants and supporting different orderings of the operations: https://github.com/RootCubed/ghidra/commit/e3597f722f2b6f1f948c8802bf1c80e8330009ef

I'm also unsure whether this is something that the team is willing to add to their codebase - for one, because it's only used for PowerPC, but also because the simplification code is quite complex; then again, it massively improves the decompilation output of code that makes use of it...

Pokechu22 commented 1 year ago

It almost matches the expected output with the difference being that it still thinks int2float() returns a double (which seems to be a general issue however).

My understanding is that PowerPC normally only has double-precision floating point registers. ppc_750cl.pdf (which is the closest public documentation to what the GameCube/Wii use) mentions on page 442 that the lfs instruction normally loads a single-precision float from memory, then converts it to a double-precision float and stores that in the register, and on page 404 that fadds adds the registers and then rounds to single-precision (though all floating-point operations need to round and e.g. fadd on the page before does the same thing, only rounding the final result to double-precision).

There are also the paired single instructions (which are an extension that these hardware variants use), where each floating-point register stores two single-precision floats instead of one double-precision float. Ghidra does not natively implement these, but ppc_gekko_broadway.slaspec in Ghidra-GameCube-Loader implements them (see e.g. psq_lx and ps_add).

Pokechu22 commented 1 year ago

I took a quick look at your code, and it mostly seems good. I did realize that I made a typo; the relevant info is §3.3.8.2 of the PDF, not §3.3.8.3.

One other thing I realized is that this conversion will always produce a double; the PDF says "The conversion of a 32-bit integer to a floating-point value is always exact." which is only possible with doubles (you can't fit all 32-bit integers into 32-bit floats; some aren't representable). Furthermore, the conversion requires using a double to store the magic 0x4330000080000000 or 0x4330000000000000 values, so it's not possible for paired-single mode to be in use (since those magic values can't fit into a single-precision float).

Uint versus non-uint conversions are also possibly incorrect, as int2float is specifically for signed integers. This would affect using ghidra pcode to emulate instructions, but for decompiled output I don't think it's going to matter at all. The strictly correct thing to do would be to zero-extend the value and then convert to a float, but that would probably make things more confusing/harder to read for no real gain. Or it would be possible to add a uint2float pcode op, but I don't know how many other things would use it; you can see #2810 for how I did that (it requires changing a lot of things though).