Storyyeller / Krakatau

Java decompiler, assembler, and disassembler
GNU General Public License v3.0
1.97k stars 220 forks source link

Support for "synchronized" #152

Closed andrewleech closed 6 years ago

andrewleech commented 6 years ago

Hi, It appears the java synchronized keyword, or smali monitor-enter/monitor-exit is not currently supported?

I'm investigating some smali which includes:

    iget-object v2, v1, LConnectionService;->listenerLock:Ljava/lang/Object;

    monitor-enter v2

    .line 276
    :try_start_0
    iget-object v1, p0, LConnectionService$1;->this$0:LConnectionService;

    # getter for: LConnectionService;->mListeners:Ljava/util/List;
    invoke-static {v1}, LConnectionService;->access$200(LConnectionService;)Ljava/util/List;

    move-result-object v1

    invoke-interface {v1, p1}, Ljava/util/List;->add(Ljava/lang/Object;)Z

    .line 277
    iget-object v1, p0, LConnectionService$1;->this$0:LConnectionService;

    # invokes: LConnectionService;->createListenerList()V
    invoke-static {v1}, LConnectionService;->access$300(LConnectionService;)V

    .line 278
    monitor-exit v2
    :try_end_0
    .catchall {:try_start_0 .. :try_end_0} :catchall_0

In jadx this decompiles to:

synchronized (ConnectionService.this.listenerLock) {
    ConnectionService.this.mListeners.add(listener);
    ConnectionService.this.createListenerList();
}

however in Krakatau I get:

Object a0 = this.this$0.listenerLock;
/*monenter(a0)*/;
try {
    ConnectionService.access$200(this.this$0).add(a);
    ConnectionService.access$300(this.this$0);
    /*monexit(a0)*/;
} catch(Throwable a1) {
    Throwable a2 = a1;
    while(true) {
        try {
            /*monexit(a0)*/;
        } catch(IllegalMonitorStateException | NullPointerException a3) {
            Throwable a4 = a3;
            a2 = a4;
            continue;
        }
        throw a2;
    }
}

Would it take much to support the synchronized keyword directly? The current Krakatau decompilation loses the thread safety provided in the original code. I'd be happy to take a look myself but not sure where to start.

Janmm14 commented 6 years ago

The problem with monenter and monexit is that supporting it via synchronized blocks would be probably impossible given some heavily obfuscated code with some illegal monexits etc. etc.

My suggestion would be to add an option where java unsafe calls are inserted for the current monenter and monexit comments, surrounded by comments warning about potential behaviour change.

Storyyeller commented 6 years ago

The fundamental problem is that Java has no equivalent for monenter/monexit, so there is no way to decompile it in general. Therefore, the best we can do is to emit comments in the generated code as you've seen.

andrewleech commented 6 years ago

Thanks for the feedback.

I've ended up post-processing the decompilation with a set of regex replaces which fixes most of the cases in the codebase I'm working with. I'll put a copy of my cleanup script here in case anyone else wants to do something similar in future:

import os
import re
from pathlib import Path

srcdir = "/path/to/decompiled/source"

for root, dirs, files in os.walk(srcdir):
    for f in sorted(files):
        t = (Path(root) / f).read_bytes()
        if b'/*monenter' in t:
            print((Path(root) / f))
            ls = re.findall(b'([ \t]*)/\*monenter\(', t)
            for l in ls:
                print((Path(root) / f))
                def rep(match):

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    return first + l + b'synchronized('+lock+b') {\n' + match[3] + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n'+l+b'try \{\r?\n(.*?\r?\n)'+l+b'\} catch.*?\n'+l+b'\}', 
                           rep, t, flags=re.DOTALL|re.MULTILINE)

                def rep2(match):
                    print([match[1], match[2]])

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    return first + l + b'synchronized('+lock+b') {\n' + match[3] + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n'+l+b'label\d+\: \{\r?\n +Throwable.*?;\r?\n(.*?\r?\n)'+l+b'\}', 
                           rep2, t, flags=re.DOTALL|re.MULTILINE)

                def rep3(match):

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    return first + l + b'synchronized('+lock+b') {\n    ' + match[3] + match[4] + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n([\S ]*;\r?\n)'+l+b'try \{\r?\n(.*?\r?\n)'+l+b'\} catch.*?\n'+l+b'\}', 
                           rep3, t, flags=re.DOTALL|re.MULTILINE)

                def rep4(match):
                    print([match[1], match[2]])

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    return first + l + match[3] + b' synchronized('+lock+b') {\n' + match[4] + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n'+l+b'(label\d+\:) try \{\r?\n(.*?\r?\n)'+l+b'\} catch.*?\n'+l+b'\}', 
                           rep4, t, flags=re.DOTALL|re.MULTILINE)

                def rep5(match):
                    print([match[1], match[2]])

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    body = match[4]
                    nbody = re.sub(b'/\*monexit\('+lock+b'\)\*/;\r?\n +\} catch\(Throwable.*?\}', b'}', body, flags=re.DOTALL|re.MULTILINE)
                    if nbody != body:
                        if len(re.findall(b' try ', nbody)) == 1:
                            nbody = nbody.replace(b' try ', b'')

                    return first + l + match[3] + b'synchronized('+lock+b') {\n' + nbody + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n'+l+b'(label\d+\: )\{\r?\n(.*?\r?\n)'+l+b'\}\r?\n'+l+b'while\(true\).*?\n'+l+b'\}', 
                           rep5, t, flags=re.DOTALL|re.MULTILINE)  

                def rep6(match):
                    print([match[1], match[2]])

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    body = match[4]
                    nbody = re.sub(b'/\*monexit\('+lock+b'\)\*/;\r?\n +\} catch\(Throwable.*?\}', b'}', body, flags=re.DOTALL|re.MULTILINE)
                    if nbody != body:
                        if len(re.findall(b' try ', nbody)) == 1:
                            nbody = nbody.replace(b' try ', b'')

                    return first + l + match[3] + b'synchronized('+lock+b') {\n' + nbody + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n'+l+b'(label\d+\: )\{\r?\n(.*?\r?\n)'+l+b'\}', 
                           rep6, t, flags=re.DOTALL|re.MULTILINE)                    

                def rep7(match):
                    print([match[1], match[2]])

                    if (b' %s = ' % match[2] in match[1]):
                        first = b''
                        lock = match[1].split(b'=')[-1].strip().rstrip(b';')
                    else:
                        first = match[1]
                        lock = match[2]

                    return first + l + match[4] + b'synchronized('+lock+b') {\n' + match[3] + match[5] + l+b'}'
                t = re.sub(b'(\n[\S ]*\r?\n)'+l+b'/\*monenter\((\S*?)\)\*/;\r?\n([\S ]*;\r?\n)'+l+b'(label\d+\: )\{\r?\n(.*?\r?\n)'+l+b'\}', 
                           rep7, t, flags=re.DOTALL|re.MULTILINE)

            (Path(root) / f).write_bytes(t)