Closed skochinsky closed 8 years ago
Could you post the full stack trace please?
Also, please post the jar it came from if you can.
P.S. What disassembler did you use to produce that?
Here's the jar (renamed to .docx to bypass github upload restriction). The assembly comes from the disassembly of Management Engine's internal JEFF file - I wrote my own disassembler that converts JEFF into Jasmin-style code accepted by Krakatau (I could not figure out how to add JEFF parsing to Krakatau). Full stack trace:
Traceback (most recent call last):
File "Krakatau\decompile.py", line 144, in <module>
decompileClass(path, targets, args.out, args.skip)
File "Krakatau\decompile.py", line 91, in decompileClass
source = printer.visit(javaclass.generateAST(c, makeGraph, skip_errors, add_throws=add_throws))
File "Krakatau\Krakatau\java\javaclass.py", line 67, in generateAST
method_defs = [_getMethod(m, cb, forbidden_identifiers, skip_errors) for m in methods]
File "Krakatau\Krakatau\java\javaclass.py", line 37, in _getMethod
graph = cb(method) if method.code is not None else None
File "Krakatau\decompile.py", line 52, in makeGraph
s.abstractInterpert()
File "Krakatau\Krakatau\ssa\graph.py", line 247, in abstractInterpert
assert(out)
AssertionError
I'll look into the decompilation issue.
Regarding the disassembler, the reason I asked is because I'm currently working on a complete rewrite of the assembler and diassembler, and I'm also planning to change the assembler format to simplify things and enable support for round-tripping and Java 8. Unfortunately, this means that I'll have to make some backward-incompatible changes to the format. I was wondering if you were interested in having any input on the new format.
On Sun, Nov 29, 2015 at 1:58 AM, Igor Skochinsky notifications@github.com wrote:
Here's the jar https://github.com/Storyyeller/Krakatau/files/46560/jom.docx (renamed to .docx to bypass github upload restriction). The assembly comes from the disassembly of Management Engine's internal JEFF file - I wrote my own disassembler that converts JEFF into Jasmin-style code accepted by Krakatau (I could not figure out how to add JEFF parsing to Krakatau). Full stack trace:
Traceback (most recent call last): File "Krakatau\decompile.py", line 144, in
decompileClass(path, targets, args.out, args.skip) File "Krakatau\decompile.py", line 91, in decompileClass source = printer.visit(javaclass.generateAST(c, makeGraph, skip_errors, add_throws=add_throws)) File "Krakatau\Krakatau\java\javaclass.py", line 67, in generateAST method_defs = [_getMethod(m, cb, forbidden_identifiers, skip_errors) for m in methods] File "Krakatau\Krakatau\java\javaclass.py", line 37, in _getMethod graph = cb(method) if method.code is not None else None File "Krakatau\decompile.py", line 52, in makeGraph s.abstractInterpert() File "Krakatau\Krakatau\ssa\graph.py", line 247, in abstractInterpert assert(out) AssertionError — Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-160395573 .
Does this code actually run? It looks to me like invalid bytecode.
The problem is in com/intel/util/PlatformId getOemIds
anewarray [Lcom/intel/util/PlatformIdImpl;
This is creating a two dimensional array, when a one dimensional array was intended. The part after anewarray is the component type.
Thanks, it was my mistake. JEFF uses the same opcode (jeff_newarray
) for both primitives (byte/char/int/etc) and objects, so I have to convert it to anewarray
when an object array is created, but I screwed up by forgetting to remove the array indicator in case of objects ( I do remove it for primitives).
Regarding new assembler: I'm far from Java expert but I can certainly provide feedback if necessary. One thing that would be very useful IMO is an official language grammar - it took me several tries to figure out where I can put .const definitions and what types of consts are supported and what should I use (e.g. what's the difference between String and Utf8?).
Native support for dex/smali would be cool btw.
As far as String vs Utf8, String is used for string literals. It's just a reference to a Utf8, which holds the underlying data and is used for (nearly) everything that requires string data in the classfile. You can read more about the details in the JVM specification.
I finished writing a draft specification for the new assembly format. What do you think?
https://docs.google.com/document/d/1O5_sbluDW5pPkzvXRKXhOm5MpWriKE3Bsj8QAreXs0U/edit?usp=sharing
I would suggest borrowing hex floats of C99 (e.g 0x0.3p10
) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.
For attributes, I think it's better to stay close to the jvm spec, so .exceptions
rather than .throws
.
Is there a reason you require labels to start with L? (not a big deal, just curious)
It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).
I added the restriction that labels start with L mainly to allow better error messages. Currently in an instruction body whenever you make a mistake, it will try to interpret the instruction name as a label and keep going.
C99 hex floats look like they are the same as Java hex floats, which is what I'm already using. And they can't represent infinity or NaN either. It's unspecified which bit pattern a standard NaN assembles to. If you care about the binary representation, you should specify it yourself.
And const references deliberately can be used before definition. As for constant pools being class specific, I figured that would be obvious, but I guess I'll add a note to that effect.
On Fri, Dec 4, 2015 at 1:10 AM, Igor Skochinsky notifications@github.com wrote:
I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.
For attributes, I think it's better to stay close to the jvm spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html, so .exceptions rather than .throws.
Is there a reason you require labels to start with L? (not a big deal, just curious)
It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).
— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-161914808 .
The main argument for .throws vs .exceptions is that it resembles a throws declaration in Java, which might make it easier to understand. Also, Jasmin uses .throws, with some minor differences.
P.S. What do you think about the syntax for multiline field declarations? That one was tough and I couldn't come up with anything I really liked.
On Fri, Dec 4, 2015 at 6:54 AM, Robert Grosse n210241048576@gmail.com wrote:
I added the restriction that labels start with L mainly to allow better error messages. Currently in an instruction body whenever you make a mistake, it will try to interpret the instruction name as a label and keep going.
C99 hex floats look like they are the same as Java hex floats, which is what I'm already using. And they can't represent infinity or NaN either. It's unspecified which bit pattern a standard NaN assembles to. If you care about the binary representation, you should specify it yourself.
And const references deliberately can be used before definition. As for constant pools being class specific, I figured that would be obvious, but I guess I'll add a note to that effect.
On Fri, Dec 4, 2015 at 1:10 AM, Igor Skochinsky notifications@github.com wrote:
I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.
For attributes, I think it's better to stay close to the jvm spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html, so .exceptions rather than .throws.
Is there a reason you require labels to start with L? (not a big deal, just curious)
It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).
— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-161914808 .
P.S. One other thing I'm torn about is whether the fmimref and invdynref rules should be modified to make them LL(1). There's a big dilemma between simplicity, elegance, and backwards compatibility.
On Fri, Dec 4, 2015 at 7:15 AM, Robert Grosse n210241048576@gmail.com wrote:
The main argument for .throws vs .exceptions is that it resembles a throws declaration in Java, which might make it easier to understand. Also, Jasmin uses .throws, with some minor differences.
P.S. What do you think about the syntax for multiline field declarations? That one was tough and I couldn't come up with anything I really liked.
On Fri, Dec 4, 2015 at 6:54 AM, Robert Grosse n210241048576@gmail.com wrote:
I added the restriction that labels start with L mainly to allow better error messages. Currently in an instruction body whenever you make a mistake, it will try to interpret the instruction name as a label and keep going.
C99 hex floats look like they are the same as Java hex floats, which is what I'm already using. And they can't represent infinity or NaN either. It's unspecified which bit pattern a standard NaN assembles to. If you care about the binary representation, you should specify it yourself.
And const references deliberately can be used before definition. As for constant pools being class specific, I figured that would be obvious, but I guess I'll add a note to that effect.
On Fri, Dec 4, 2015 at 1:10 AM, Igor Skochinsky <notifications@github.com
wrote:
I would suggest borrowing hex floats of C99 (e.g 0x0.3p10) which can be used to represent non-standard NaNs/denormals. Also, maybe you should mention exactly what bit pattern do "standard" NaN/Inf expand to.
For attributes, I think it's better to stay close to the jvm spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html, so .exceptions rather than .throws.
Is there a reason you require labels to start with L? (not a big deal, just curious)
It seems .const definitions can be interleaved with methods. You should probably mention that a const reference may only be used after the definition, and only within the same class(?).
— Reply to this email directly or view it on GitHub https://github.com/Storyyeller/Krakatau/issues/52#issuecomment-161914808 .
Another issue I ran into was how to distinguish Methodrefs and InterfaceMethodrefs for invoke instructions. One idea I had was to use tagged const, but that's a little verbose. What do you think?
example: invokestatic Method java/util/Arrays fill ([[Ljava/lang/Object;[Ljava/lang/Object;)V invokestatic InterfaceMethod java/util/Arrays fill ([[Ljava/lang/Object;[Ljava/lang/Object;)V
The new assembler and diassembler is now available for you to try. What do you think?
I'm getting an assert(out) triggering in SSA_Graph/abstractInterpert(). The class in question is pretty short so I'll paste the assembly here:
Any idea what's the problem and how to fix it?