beardypig / ghidra-emotionengine

Ghidra Processor for the Play Station 2's Emotion Engine MIPS based CPU
Apache License 2.0
198 stars 35 forks source link

Vector/matrix decompilation ergonomics #45

Open PFedak opened 3 years ago

PFedak commented 3 years ago

I've been looking at some functions that do relatively large amounts of vector math (~2000 total instructions, about 300 v* instructions). On my system, some of these take 50-60 seconds to decompile, and even then the output for several matrix multiplies is easier to read from the assembly since ghidra splits them into sets of 16 parallel equations.

On the timing point, I'm very eager to hear suggestions for profiling and investigating further, but in my desperate attempts to improve the situation I found that replacing the injected java VU commands with pure pcode versions (with heavy use of macros, including MAC and status logic) cut the time in half. I'm happy to make a PR with my changes, but that may not be a direction you want to move in. I can send one of the functions if that would make testing easier, but I'm not sure the best format. Just raw binary?

For the output, I'm assuming original code used some form of macros. I'd love to see blocks of the form

    lqc2 vf24, 0x0(a0)
    lqc2 vf25, 0x10(a0)
    lqc2 vf26, 0x20(a0)
    lqc2 vf27, 0x30(a0)

and

    vmulax.xyzw ACC, vf24, vf28x
    vmadday.xyzw ACC, vf25, vf28y
    vmaddaz.xyzw ACC, vf26, vf28z
    vmaddw.xyzw vf28, vf27, vf28w

condensed into some more simplified form, but I'll believe you if this just isn't feasible (or desired).

astrelsky commented 3 years ago

It would definitely simply things. I've wanted to write a python script to generate the sleigh but never got around to it. I'm not a fan of the java code for this myself as all it is really doing is dynamically generating the sleigh and compiling it.

It may be worth seeing if adding some sort of caching mechanism would help.

You can profile the java portion using visualvm.