Storyyeller / Krakatau

Java decompiler, assembler, and disassembler
GNU General Public License v3.0
1.95k stars 220 forks source link

Classes with identical simple names not distinguished #16

Closed ysangkok closed 2 years ago

ysangkok commented 11 years ago

Original source:

class Integer {
    Integer(boolean a) {
    }
}

class Test {
    Integer fortytwo = new Integer(false);
    java.lang.Integer fortythree = 43;
}

Test setup:

% tree
.
├── decompiled
└── Test.java

1 directory, 1 file
% javac Test.java                                                                              
% jar cf Test.jar *.class                                                                            
% cd decompiled 
% python /var/www/temp/Krakatau/decompile.py -path /usr/lib/jvm/java-7-oracle/jre/lib/rt.jar ../Test.jar
[...]
Class written to /home/janus/Skrivebord/test/decompiled/Integer.java
[...]
Class written to /home/janus/Skrivebord/test/decompiled/Test.java

Decompiled source:

class Test {
    Integer fortytwo;
    Integer fortythree;

    Test()
    {
        super();
        Integer a = new Integer(false);
        this.fortytwo = a;
        Integer a0 = Integer.valueOf(43);
        this.fortythree = a0;
    }
}

and

class Integer {
    Integer(boolean b)
    {
        super();
    }
}

Expected: Distinguished class names in Test.java.

Comment: This may look contrived, but to encounter a real world issue, simply swap "Integer" with "Polygon", make your own Polygon class and use AWT (it has a polygon too).

Storyyeller commented 11 years ago

Ah, it looks like the problem is that Krakatau tries to be helpful by removing the java.lang prefix from class names. But I'm hesitant to add it back in since in general it will lead to messier code. What do you suggest?

Anyway, AWT shouldn't be a problem because this is only done for java.lang.

ysangkok commented 11 years ago

The package java.lang could be traversed for classes. If a java.lang class name is shadowed, the verbose behavior could be triggered. I don't know if this would require an extra run-through. Another option is a command line switch.

I think it should definitely be possible to get the correct behavior automatically, since it is possible to construct bytecode that shadows java.lang's classes on purpose (as an obfuscation method).

Also, it is not guaranteed that the incorrect code will not compile!

For example, if the constructor argument for the shadowing Integer class was a String, the invalid code would even compile, since the signature would match java.lang.Integer's String constructor! That would be an absolute nightmare to debug.

I think it is more important than the other issues I reported, since it can lead to compiling incorrect code.

Thanks for the quick reply.

Storyyeller commented 11 years ago

There are tons of ways to create bytecode that is impossible to represent in Java. The reason Krakatau is useful is because most obfuscators don't.

Anyway, I was hoping to avoid exhaustive directory searches to enumerate classes, but I think I may try adding it as a commandline switch.

Lanchon commented 10 years ago

don't detect shadowing, it is too strict a condition to disable the full qualification of java.lang classes. thought the output would always be correct, sometimes it would not be clear enough for humans.

i propose: compile the set of all simple names of all referenced types in the current outer class and its nested children, except for java.lang types. when the name of java.lang type has to be output, use the fully qualified name if its simple name is in the set. this forces all fq names if there's a corresponding name clash anywhere in the current source file, which is a better policy for humans imho.

in fact this can be generalized: compile a map of all referenced types where simple name is key and fq name is value. while adding each type to the map, check for simple name clashes by detecting a would-be change of the existing value of the corresponding key, and if there is a clash replace the existing value with a sentinel. afterwards, use all non-sentinel entries to create type import statements. finally, whenever the name of type has to be output, use the fq name if the value mapped to its simple name is a sentinel.

still, two different types in the same VM can have the same fq name if they are loaded by different classloaders. the implications of this can be ignored until someone creates a very very nasty obfuscator.

Janmm14 commented 8 years ago

A flag to stop removal of java.lang.?

Storyyeller commented 8 years ago

I'm not sure that's worth creating a flag for. If you want to patch it locally, it's only a 1 or 2 line change.

ysangkok commented 2 years ago

Closing this since I stopped using Krakatau and I don't know whether it is still a bug.