DexPatcher / dexpatcher-tool

Android Dalvik bytecode patcher.
https://dexpatcher.github.io/
GNU General Public License v3.0
428 stars 79 forks source link

Add retargeting support for obfuscated classes/fields/methods #28

Closed Novex closed 4 years ago

Novex commented 5 years ago

This PR allows dexpatcher to transparently rename targeted class, method and field references in:

It builds a map of targets from the patch file, then rewrites the patch file with the new targets before applying it.

So effectively it will make this:

@DexEdit(target = "ro.numedecod.a.e.a.a", contentOnly = true)
public class a__ {

    @DexEdit(target = "ro.numedecod.a.e.a.a$c", contentOnly = true)
    public class c extends ro.numedecod.a.e.a_.b_ {

        @DexEdit(target = "b")
        private List<a_nalUnitInfo> b_sequenceParameterSetList = new ArrayList();

        @DexEdit(target = "c")
        private List<a_nalUnitInfo> c_pictureParameterSetList = new ArrayList();

        @DexReplace(target = "a")
        public final byte a_getVideoProfile() {
            return this.a_bytestreamBeginning[this.b_sequenceParameterSetList.get(0).a_nalUnitHeaderIndex + 1];
        }

        @DexReplace(target = "a")
        final void a_keepInformationFromNalUnitHeader(a_nalUnitInfo nalUnitInfo) {
            // Get the NAL unit type
            switch(this.a_bytestreamBeginning[nalUnitInfo.a_nalUnitHeaderIndex] & 15) {
                case 7:     // sequence parameter set
                    this.b_sequenceParameterSetList.add(nalUnitInfo);
                    break;
                case 8:     // picture parameter set
                    this.c_pictureParameterSetList.add(nalUnitInfo);
            }
        }
    }
}

As if you had written this (which may not have been be syntactically correct at compile time):

@DexEdit(contentOnly = true)
public class a {

    @DexEdit(contentOnly = true)
    public class c extends ro.numedecod.a.e.a.b {

        @DexEdit
        private List<a> b = new ArrayList();

        @DexEdit
        private List<a> c = new ArrayList();

        @DexReplace
        public final byte a() {
            return this.a[this.b.get(0).a + 1];
        }

        @DexReplace
        final void a(a nalUnitInfo) {
            // Get the NAL unit type
            switch(this.a[nalUnitInfo.a] & 15) {
                case 7:     // sequence parameter set
                    this.b.add(nalUnitInfo);
                    break;
                case 8:     // picture parameter set
                    this.c.add(nalUnitInfo);
            }
        }
    }
}

The main benefit being that proguarded code can be annotated (which makes understanding what's going on much easier) and edited.

There are a few edge cases I've come across that still need to narrow down:

Also I haven't been able to get the tests running (I'm developing on windows - not sure if that has anything to do with it), but would like to add some for the more obscure stuff.

Resolves #27

Lanchon commented 4 years ago

hi @Novex,

very thankful for this patch. unfortunately, this breaks the use case and contract of @DexEdit(target = "...", contentOnly = true) on classes. it also breaks @DexEdit(target = "...") on members, which is intended to reprototype them (eg, rename them).

and, as you have noticed when coding it, it belongs to an altogether different processing stage than dxp's code patching itself.

the motivating rationale for these changes is working around the inability of the Java language to produce certain legal bytecode names. this is a shortcoming of Java that other languages do not have. (Many others allow quoting names to support the full naming freedom of the JVM.)

hamodify the ndling obfuscated code correctly requires much more than working around this Java limitation. but if due to limited developing time we only want to work around this issue, i think abusing the dxp tags is not the right solution. also, this breaks backwards patch compatibility which makes it unacceptable.

but i think it's really cool that you got into the code and hacked this solution for yourself and shared it.

are you still using dxp? was it useful for you?

there are still a couple of things i want to add to this tool before v2. if v2 ever happens, it will be able to patch instructions within method bodies. bug given the low rate of adoption, my lack of time, and lack of contributors, that might never happen. i also want JIT dexpatching in ART to retain signatures and increase adoption, now that Xposed is mostly dead.

anyway these are the features needed to complete v1. in no particular order:

  1. a basic way to handle obfuscation
  2. the DexEdit tag
  3. DexAppend to instance constructors

thanks!

Lanchon commented 4 years ago

Also I haven't been able to get the tests running (I'm developing on windows - not sure if that has anything to do with it), but would like to add some for the more obscure stuff.

as a general advice: install WSL, microsoft's linux kernel that runs alongside the NT kernel. with it you will get all of linux userland from ubuntu running natively. (windows for developing is a joke. better yet, switch to linux.)

Lanchon commented 4 years ago

i'll tell you a little bit more about this renaming problem. although dexlib2 has the rewriters, it is not that trivial. you rename a class method, and all references to it. this is trivial using dexlib2. but... it breaks the code anyway.

say there is another class that extends the previous, possibly indirectly. that class also defines a method with a similar name and signature (or somewhat different signature, but the Java compiler generates a signature-compatible synthetic companion method anyway (ask me about this if you want)). this method used to override the previous one, but not anymore. the code is broken.

and the original method, is it overriding something? because if it is, the rename breaks the code right out of the bat.

there are several annotations in java bytecode and dalvik bytecode that are defined by the VM standard and extend the meaning of the bytecode. for example, the compiler's full generic type information, even after the type-erasure necessary to create the bytecode happens, is still available at runtime, encoded in such annotations. which i agree are not vital, but correct renaming requires processing them, and dexlib has zero knowledge of these.

so renaming requires global analysis and application of VM specs beyond what dexlib2 provides. it is a non-trivial problem, and a project on itself. it requires spec study, tests and probably some iterations to get right. i would definitely use it for dexpatcher-gradle if there was solution already.

what i would like to have for the dexpatcher ecosystem is a dex query language coupled to a renamer for analysis and to an inverse renamer of the patch for patching. this way, as you analyze obfuscated code you determine useful names of items, and instead of hard-linking them to obfuscated names, you pair them with dex queries. queries such as "a class that possibly indirectly extends InputStream and has a method that takes an int and returns an int that in its implementation calls this method of this other class" (which could also be obfuscated and resolved with a query too). this way, different runs of the obfuscator (ie: different versions of the APK) would be automatically deobfuscated if the queries resolve.

such deobfuscated code would only feed symbols to the patch for compilation and would also be decompiled on-demand in the IDE for further iterative analysis. the patch would be constructed with those deobfuscated names. when it is time to apply the patch, queries would resolve against the original APK and the patch would be reverse renamed to obfuscate it matching the current APK obfuscation, and only then be applied.

the rename queries (probably expressed in a scripting language for which a runtime exists for java, such as javascript, groovyscript, LUA, etc., and that provides suitable domain-specific language definition constructs) plus the deobfuscated patch would become patch application unit, and would apply to any version of an obfuscated app.

this of course will never happen because -although i once planned it- i realized that i dont have the resources to implement it (time, motivation, team, etc).

but i still want to do something cheap and just good enough... i think i'll do that.

Lanchon commented 4 years ago

another solution would be a hand-made, hard-linked deobfuscation dictionary, plus a tool that could match similar items of different runs of the obfuscator. this tool could de used to evolve the dictionary, but possibly only in one direction (you end up with many dictionaries with different amounts of deobfuscation work done on each, a nightmare if you want to backport new changes to older APKs), and each distributed patch+dictionary would only apply to a single version of the APK.

Lanchon commented 4 years ago

here is early phase 1 obfuscation support: https://github.com/DexPatcher/dexpatcher-tool/releases/tag/v1.8.0-alpha1

this should solve all your issues :)

Lanchon commented 4 years ago

external mapping and automatic encoding have been added (not everything is documented yet) as other options for handling this kind of issue.

closing... and thank you for your help!!