AlexeySoshin / smali2java

Recreate Java code from Smali
497 stars 85 forks source link

Enhancement - use antlr to build out parser from grammar / lexar file #11

Open 8secz-johndpope opened 5 years ago

8secz-johndpope commented 5 years ago

to stabilise parser -

I suggest rebuilding some of the code to leverage the antlr grammar / g4 files here https://github.com/psygate/smali-antlr4-grammar

If you download this wget https://www.antlr.org/download/antlr-4.7.2-complete.jar

you can then run

java -cp antlr-4.7.2-complete.jar org.antlr.v4.Tool -Dlanguage=Go -visitor -o gen SmaliLexer.g4
java -cp antlr-4.7.2-complete.jar org.antlr.v4.Tool -Dlanguage=Go -visitor -o gen SmaliParser.g4

this will spit out the following files / code Screen Shot 2019-06-12 at 11 27 49 pm

https://gist.github.com/8secz-johndpope/30868ccd59f211f0000b90e6176dead7

you should then be able to walk through the smali file / maybe reducing the out of bounds crashes people (including myself) have been experiencing.

For illustration - I successfully used the grammar files to build out parsers / lexers for hundreds of languages with swift https://github.com/johndpope/ANTLR-Swift-Target https://github.com/johndpope/Antlr-Swift-runtime

I forget the entry point into class / it changes for each grammar

Here is the code for swift to read a java file you can find in the above repo.


  let textFileName = "Test.java"

            if let textFilePath = Bundle.main.path(forResource: textFileName, ofType: nil) {
                let lexer =  Java8Lexer(ANTLRFileStream(textFilePath))
                print("lexer:",lexer)
                let tokens =  CommonTokenStream(lexer)
                let parser = try Java8Parser(tokens)

                let tree = try parser.compilationUnit()
                print("tree:",tree)

                let walker = ParseTreeWalker()
                let java8walker = Java8Walker()
                try walker.walk(java8walker,tree)

            } else {
                print("error occur: can not open \(textFileName)")
            }

The psuedo code would be


  let textFilePath = "/path/Test.smali"

                let lexer =  NewSmaliLexer(ANTLRFileStream(textFilePath)) //this NewSmaliLexer exists 
                print("lexer:",lexer)
                let tokens =  CommonTokenStream(lexer) /// ?? there should be a method to do this
                let parser = try NewSmaliParser(tokens)

                let tree = try parser.compilationUnit() // maybe ToStringTree?
                print("tree:",tree)

                let walker = ParseTreeWalker() // Here as the lexer / parser reads - you can hook in to translate stuff. 
                let java8walker = Java8Walker()
                try walker.walk(java8walker,tree)

there are other people who have created translation using antlr to do this https://github.com/8secz-johndpope/ObjcGrammar you may need some help - when I have more time I will circle back.

AlexeySoshin commented 5 years ago

You're right, that approach would be much better, as currently I support only a very limited amount of instructions. Will look into it.

8secz-johndpope commented 5 years ago

vscode has smali syntax highlighting https://github.com/ViRb3/vscode-smali/tree/master/smali could this help?

if you surface any work in a new feature branch - I'm happy to take a look

AlexeySoshin commented 5 years ago

@8secz-johndpope Thanks for getting back with this issue :) Took a look at it, but it's actually more confusing, since it's based on regexes. Planning to make another branch for antlr this week, per your suggestions.