Col-E / Recaf

The modern Java bytecode editor
https://recaf.coley.software
MIT License
5.95k stars 461 forks source link

(re)Assembler renumbers `var` variables #484

Closed werame closed 4 months ago

werame commented 2 years ago

I consider this almost a bug from a code comprehensibility prospective, but since the (re)assembled class seems correct, I file this as an enhancement request.

Basically, if the var-named variables aren't sequential in the original disassembly after some modifications, e.g. there's a var18 that appears after a var6, the Recaf re-assembles these by renumbering them, e.g. it turned var18 into var7 and similarly for quite a number of others. This makes it hard to follow if the changes are correct because it introduced numerous gratuitous name changes.

I'm pretty sure that if someone submits a PR for one of your programs, you wouldn't accept it if half the variables were gratuitously renamed... Same thing here with the assembler renaming stuff.

Also, it widened the declared types of those variables it moved around to Ljava/lang/Object, which is generally ok, but I think was incorrect in one case where it upgraded a byte[] to Object. Jogging my memory on how Java stores the primitive arrays (by ref), that's still correct, but fairly counterintuitive that the Recaf assembler did that.


And the java.lang.Objs are alas not harmless. I'm getting something like

java.lang.VerifyError: Bad type on operand stack
Type 'java/lang/Object' (current frame, stack[3]) is not assignable to 'byK'

which I'm pretty sure is caused by that liberty Recaf/ASM takes. I guess it's back to using krakatau, despite it being not really maintained anymore.

Col-E commented 2 years ago

The verify error is usually caused moreso by the exporting process being unable to compute the common types of items, and thus falling back onto java/lang/Object as a default. Variable debug information doesn't actually play a part in that step, its purely visual and for convenience.

As for the renaming, yeah, the 2X assembler has some problems with variable scoping so the current solution is to make each variable unique. I've not seen it re-scramble names like this though in the samples I use for testing. But generally speaking the 2X assembler does have some problems which are seemingly mostly addressed in the 3X WIP branch. 3X still makes each variable unique but does so in a more reliable and reproducible fashion.

werame commented 2 years ago

This code I'm dealing with alas uses invokedynamic so (apparently) I have to deal with stack maps.

Col-E commented 2 years ago

You always have to deal with stack frames, they've been required since Java 7.

ASM, the bytecode library Recaf uses, automatically generates them for modified classes on export. Like I mentioned above, Recaf needs to be able to compute the common types of items (List/Set --> Collection) and it it doesn't it will try to use Object.

This is typically avoided by the usage of JPhantom which Recaf runs in the background to fill in missing type information. It sometimes fails on larger inputs though and can struggle with certain forms of obfuscation.

To give a more detailed explanation of what's happening with Type 'java/lang/Object' (current frame, stack[3]) is not assignable to 'byK' and where the stack computation breaks down I'd have to have a sample, which I understand you cannot provide. My obfuscated samples have not made this problem occur in my usage.

werame commented 2 years ago

It was alas one large class (>100Kb bytecode) that was causing problems for Recaf/ASM. I got it instrumented correctly with Krakatau. I had to do some manual maxlolcals and maxstack adjustments though, but I'm not sure if that's the problem with ASM. The trouble with it doing full rewrites is that's almost impossible to debug bugs in ASM itself.

Alas this big file is proprietary... and not malware. But it had malware hooks in it. The more sophisticated malware itself that was in (much) smaller classes. Recaf/ASM had no actual problem instrumenting those. I could share these... but I'm not seeing the point as Recalf didn't have issue with them.

Col-E commented 4 months ago

Closing as 4.X is the only branch we're working on going forward. The semantics of the assembler are a bit different now so this is not really relevant in the new version.