Storyyeller / Krakatau

Java decompiler, assembler, and disassembler
GNU General Public License v3.0
1.95k stars 220 forks source link

assert during decompilation #52

Closed skochinsky closed 8 years ago

skochinsky commented 8 years ago

I'm getting an assert(out) triggering in SSA_Graph/abstractInterpert(). The class in question is pretty short so I'll paste the assembly here:

.class super com/intel/util/PlatformIdImpl
.super com/intel/util/PlatformId

.field private m_idType I

.method  <init> : (I)V
       .limit stack 2
       .limit locals 2

;        Bytecode disassembly:
         aload_0 
         invokespecial com/intel/util/PlatformId/<init>()V
         aload_0 
         iconst_0 
         putfield PlatformIdImpl/m_idType I
         aload_0 
         iload_1 
         putfield PlatformIdImpl/m_idType I
         return 
.end method

.method public getType : ()I
       .limit stack 2
       .limit locals 1

;        Bytecode disassembly:
         iconst_0 
         aload_0 
         getfield PlatformIdImpl/m_idType I
         if_icmpne LABEL_00B1
         new com/intel/util/UtilException
         dup 
         invokespecial com/intel/util/UtilException/<init>()V
         athrow 
         LABEL_00B1:
         aload_0 
         getfield PlatformIdImpl/m_idType I
         ireturn 
.end method

.method public getValue : ()[B
       .limit stack 3
       .limit locals 2

;        Bytecode disassembly:
         iconst_0 
         aload_0 
         getfield PlatformIdImpl/m_idType I
         if_icmpne LABEL_00D3
         new com/intel/util/UtilException
         dup 
         invokespecial com/intel/util/UtilException/<init>()V
         athrow 
         LABEL_00D3:
         iconst_4 
         newarray byte
         astore_1 
         aload_0 
         getfield PlatformIdImpl/m_idType I
         aload_1 
         iconst_0 
         invokestatic com/intel/util/UtilNative/getOemId(I[BI)I
         pop 
         aload_1 
         areturn 
.end method
.end class

Any idea what's the problem and how to fix it?

Storyyeller commented 8 years ago

Could you post the full stack trace please?

Also, please post the jar it came from if you can.

Storyyeller commented 8 years ago

P.S. What disassembler did you use to produce that?

skochinsky commented 8 years ago

Here's the jar (renamed to .docx to bypass github upload restriction). The assembly comes from the disassembly of Management Engine's internal JEFF file - I wrote my own disassembler that converts JEFF into Jasmin-style code accepted by Krakatau (I could not figure out how to add JEFF parsing to Krakatau). Full stack trace:

Traceback (most recent call last):
  File "Krakatau\decompile.py", line 144, in <module>
    decompileClass(path, targets, args.out, args.skip)
  File "Krakatau\decompile.py", line 91, in decompileClass
    source = printer.visit(javaclass.generateAST(c, makeGraph, skip_errors, add_throws=add_throws))
  File "Krakatau\Krakatau\java\javaclass.py", line 67, in generateAST
    method_defs = [_getMethod(m, cb, forbidden_identifiers, skip_errors) for m in methods]
  File "Krakatau\Krakatau\java\javaclass.py", line 37, in _getMethod
    graph = cb(method) if method.code is not None else None
  File "Krakatau\decompile.py", line 52, in makeGraph
    s.abstractInterpert()
  File "Krakatau\Krakatau\ssa\graph.py", line 247, in abstractInterpert
    assert(out)
AssertionError
Storyyeller commented 8 years ago

I'll look into the decompilation issue.

Regarding the disassembler, the reason I asked is because I'm currently working on a complete rewrite of the assembler and diassembler, and I'm also planning to change the assembler format to simplify things and enable support for round-tripping and Java 8. Unfortunately, this means that I'll have to make some backward-incompatible changes to the format. I was wondering if you were interested in having any input on the new format.

On Sun, Nov 29, 2015 at 1:58 AM, Igor Skochinsky notifications@github.com wrote:

Here's the jar https://github.com/Storyyeller/Krakatau/files/46560/jom.docx (renamed to .docx to bypass github upload restriction). The assembly comes from the disassembly of Management Engine's internal JEFF file - I wrote my own disassembler that converts JEFF into Jasmin-style code accepted by Krakatau (I could not figure out how to add JEFF parsing to Krakatau). Full stack trace:

Traceback (most recent call last): File "Krakatau\decompile.py", line 144, in decompileClass(path, targets, args.out, args.skip) File "Krakatau\decompile.py", line 91, in decompileClass source = printer.visit(javaclass.generateAST(c, makeGraph, skip_errors, add_throws=add_throws)) File "Krakatau\Krakatau\java\javaclass.py", line 67, in generateAST method_defs = [_getMethod(m, cb, forbidden_identifiers, skip_errors) for m in methods] File "Krakatau\Krakatau\java\javaclass.py", line 37, in _getMethod graph = cb(method) if method.code is not None else None File "Krakatau\decompile.py", line 52, in makeGraph s.abstractInterpert() File "Krakatau\Krakatau\ssa\graph.py", line 247, in abstractInterpert assert(out) AssertionError

— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-160395573 .

Storyyeller commented 8 years ago

Does this code actually run? It looks to me like invalid bytecode.

The problem is in com/intel/util/PlatformId getOemIds

anewarray [Lcom/intel/util/PlatformIdImpl;

This is creating a two dimensional array, when a one dimensional array was intended. The part after anewarray is the component type.

skochinsky commented 8 years ago

Thanks, it was my mistake. JEFF uses the same opcode (jeff_newarray) for both primitives (byte/char/int/etc) and objects, so I have to convert it to anewarray when an object array is created, but I screwed up by forgetting to remove the array indicator in case of objects ( I do remove it for primitives).

Regarding new assembler: I'm far from Java expert but I can certainly provide feedback if necessary. One thing that would be very useful IMO is an official language grammar - it took me several tries to figure out where I can put .const definitions and what types of consts are supported and what should I use (e.g. what's the difference between String and Utf8?).

Native support for dex/smali would be cool btw.

Storyyeller commented 8 years ago

As far as String vs Utf8, String is used for string literals. It's just a reference to a Utf8, which holds the underlying data and is used for (nearly) everything that requires string data in the classfile. You can read more about the details in the JVM specification.

Storyyeller commented 8 years ago

I finished writing a draft specification for the new assembly format. What do you think?

https://docs.google.com/document/d/1O5_sbluDW5pPkzvXRKXhOm5MpWriKE3Bsj8QAreXs0U/edit?usp=sharing

skochinsky commented 8 years ago

I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.

For attributes, I think it's better to stay close to the jvm spec, so .exceptions rather than .throws.

Is there a reason you require labels to start with L? (not a big deal, just curious)

It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).

Storyyeller commented 8 years ago

I added the restriction that labels start with L mainly to allow better error messages. Currently in an instruction body whenever you make a mistake, it will try to interpret the instruction name as a label and keep going.

C99 hex floats look like they are the same as Java hex floats, which is what I'm already using. And they can't represent infinity or NaN either. It's unspecified which bit pattern a standard NaN assembles to. If you care about the binary representation, you should specify it yourself.

And const references deliberately can be used before definition. As for constant pools being class specific, I figured that would be obvious, but I guess I'll add a note to that effect.

On Fri, Dec 4, 2015 at 1:10 AM, Igor Skochinsky notifications@github.com wrote:

I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.

For attributes, I think it's better to stay close to the jvm spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html, so .exceptions rather than .throws.

Is there a reason you require labels to start with L? (not a big deal, just curious)

It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).

— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-161914808 .

Storyyeller commented 8 years ago

The main argument for .throws vs .exceptions is that it resembles a throws declaration in Java, which might make it easier to understand. Also, Jasmin uses .throws, with some minor differences.

P.S. What do you think about the syntax for multiline field declarations? That one was tough and I couldn't come up with anything I really liked.

On Fri, Dec 4, 2015 at 6:54 AM, Robert Grosse n210241048576@gmail.com wrote:

I added the restriction that labels start with L mainly to allow better error messages. Currently in an instruction body whenever you make a mistake, it will try to interpret the instruction name as a label and keep going.

C99 hex floats look like they are the same as Java hex floats, which is what I'm already using. And they can't represent infinity or NaN either. It's unspecified which bit pattern a standard NaN assembles to. If you care about the binary representation, you should specify it yourself.

And const references deliberately can be used before definition. As for constant pools being class specific, I figured that would be obvious, but I guess I'll add a note to that effect.

On Fri, Dec 4, 2015 at 1:10 AM, Igor Skochinsky notifications@github.com wrote:

I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.

For attributes, I think it's better to stay close to the jvm spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html, so .exceptions rather than .throws.

Is there a reason you require labels to start with L? (not a big deal, just curious)

It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).

— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-161914808 .

Storyyeller commented 8 years ago

P.S. One other thing I'm torn about is whether the fmimref and invdynref rules should be modified to make them LL(1). There's a big dilemma between simplicity, elegance, and backwards compatibility.

On Fri, Dec 4, 2015 at 7:15 AM, Robert Grosse n210241048576@gmail.com wrote:

The main argument for .throws vs .exceptions is that it resembles a throws declaration in Java, which might make it easier to understand. Also, Jasmin uses .throws, with some minor differences.

P.S. What do you think about the syntax for multiline field declarations? That one was tough and I couldn't come up with anything I really liked.

On Fri, Dec 4, 2015 at 6:54 AM, Robert Grosse n210241048576@gmail.com wrote:

I added the restriction that labels start with L mainly to allow better error messages. Currently in an instruction body whenever you make a mistake, it will try to interpret the instruction name as a label and keep going.

C99 hex floats look like they are the same as Java hex floats, which is what I'm already using. And they can't represent infinity or NaN either. It's unspecified which bit pattern a standard NaN assembles to. If you care about the binary representation, you should specify it yourself.

And const references deliberately can be used before definition. As for constant pools being class specific, I figured that would be obvious, but I guess I'll add a note to that effect.

On Fri, Dec 4, 2015 at 1:10 AM, Igor Skochinsky <notifications@github.com

wrote:

I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.

For attributes, I think it's better to stay close to the jvm spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html, so .exceptions rather than .throws.

Is there a reason you require labels to start with L? (not a big deal, just curious)

It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).

— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-161914808 .

Storyyeller commented 8 years ago

Another issue I ran into was how to distinguish Methodrefs and InterfaceMethodrefs for invoke instructions. One idea I had was to use tagged const, but that's a little verbose. What do you think?

example: invokestatic Method java/util/Arrays fill ([[Ljava/lang/Object;[Ljava/lang/Object;)V invokestatic InterfaceMethod java/util/Arrays fill ([[Ljava/lang/Object;[Ljava/lang/Object;)V

Storyyeller commented 8 years ago

The new assembler and diassembler is now available for you to try. What do you think?