DexPatcher / dexpatcher-tool

Android Dalvik bytecode patcher.
https://dexpatcher.github.io/
GNU General Public License v3.0
423 stars 82 forks source link

--map-source / --unmap-output inner classes #33

Open andrewleech opened 4 years ago

andrewleech commented 4 years ago

Hi there, I hope you're doing ok, weathering the storm as it were.

I've just started playing with --map-source, --unmap-output, --map mapping.txt from a master build to port my patches to a new version of the app with wildly different obfuscation.

One the of the main classes being patches has inner (not anon) classes which I also need to patch. Are these supported at all at the moment?

java -jar "dexpatcher-tool\tool\build\libs\dexpatcher-1.8.0-alpha4.jar" --deanon-source --main-plan "Anon[_Level]" --output Anon\classes.dex classes.dex --map-source --map ../mapping.txt

I've tried:

com.foo.iq4 -> com.foo.NotificationComponent:
     Object a() -> processNotificationAdd
     Object b() -> sendToDevice
     d() -> NotificationMonitor ; inner class attemp

and

com.foo.iq4 -> com.foo.NotificationComponent:
     Object a() -> processNotificationAdd
     Object b() -> sendToDevice

; second inner class attempt
com.foo.iq4$d -> com.foo.NotificationComponent$NotificationMonitor:
     String a -> tag
     String c -> packageName

neither of which seem to work.

In a decompiled copy of the outputted dex I do get iq4.java replace by NotificationComponent.java with all the inner classes now split out as iq4$a.java and so on. iq4$d.java persists with both the mappings as shown above, where I'd prefer it to be NotificationComponent$NotificationMonitor

Thanks in advance, in general this new mapping looks fantastic!

Lanchon commented 4 years ago

how are you!

lol so u've been poking around the repo :) it's been 3 months since i last committed. the thing was ready for a beta release (minus testing) but i couldn't get myself to document the release... sigh

this new mapping looks fantastic!

thanks! there's a trick or two under the sleeve that i'm sure will surprise you there :)

so dexptacher-tool... let me take a look at how things stand right now

Lanchon commented 4 years ago

ok. class name iq4$a is weird. it's arguable that it should automatically be renamed to NotificationComponent$a in your case, but i chose not to.

there are technical reasons. a mapping functioning akin to 'iq4$ -> NotificationComponent$` is not really a mapping, but a mapping template.

say the internal structures of a map would allow this level of expression. applying the map would require hierarchical searches of the name space for each class REFERENCE (happens almost everywhere in bytecode, not just class def headers) and could make the mapping perform horribly.

cache transformed names? well it adds to the working memory (now mapping PLUS cache) and i want the thing to be usable in phones (low memory, low performance). but also, mappings are applied with multithreading in certain cases (typical for debug builds of apps for fast turnarounds, the ones using --multi-dex-threaded). so having caches is problematic: requires synchronization which again is a performance killer. or you could cache per-thread: even more memory.

but what happens when i need to instantiate a mapping that is the inverse of an existing mapping? with mapping templates, this can't be done; i need real maps.

so nope, for many reasons, the right way would be adding a pre-process step where the read-in map template is transformed into a map. the pre-process step would also require access to the bytecode to be transformed. a hell of a lot of code for low returns, but it should work. right?

hold on. so a transform is built based on a transform template + bytecode. once that is done you can invert the transform. only there's a problem: the inverse will not going to be applied to the same (transformed) code! the code has been patched. so where it's different, the template would not really apply as expected!

ok, recalculate transforms for each bytecode to be processed. nope! because now inverses would not really invert the previous transform. so they wouldn't guarantee code restoration as they do now. the previous solution would be the best, with the ugly property of very obscure semantics for template expansion in weird cases.

crappy... but better than the nothing there is now?

not really. obfuscators i've seen don't preserve outer/inner class name relationships. why the hell your obfuscator produced iq4$a instead of simply iq5? in my mind there seems to be no valid reason to expose inner relationships to code reversers. and most obfuscators don't.

so i'm really surprised to see this. i never expected it.

sooooo........

that's to justify my lack of support for 'iq4$ -> NotificationComponent$`. it was not laziness, it was a well weighted decision.

that said, 'com.foo.iq4$d -> com.foo.NotificationComponent$NotificationMonitor' should totally work. maps don't bother with nested classes at all. $ is just a regular character. it should work, something fishy is going on.

in fact, the test 'suite' uses 3 simultaneous maps, and all of them operate on nested classes like that. tests pass because they all work. take a look:

https://github.com/DexPatcher/dexpatcher-tool/blob/master/test/mapping.txt https://github.com/DexPatcher/dexpatcher-tool/blob/master/test/compose-mapping.txt https://github.com/DexPatcher/dexpatcher-tool/blob/master/test/encode-mapping.txt

Lanchon commented 4 years ago

using com.foo.iq4$d -> com.foo.NotificationComponent$NotificationMonitor, please run your test transform with --debug and post the complete log.

andrewleech commented 4 years ago

Ah, brilliant. As much as it'd be convenient to automatically have all (just) the inner classes automatically translated that makes so much sense. And really, the mapping is already such that you explicitly list the things you want renamed, and inner classes being just different named classes that fits perfectly.

I updated my mapping file to list all the inner classes, deleted my temp files and re-ran with --debug and wouldn't you know it, it works perfectly now.

I think last night when testing I probably left the d() -> NotificationMonitor ; inner class attemp line in the outer class which probably killed the operation entirely and the rest of my script would have continued to decompile the previously created temp files.

So now it's working great, thanks!

I do think there might be a minor glitch though, so I'm comparing the decompiled code and without mapping the inner classes, the code references are all going to the obf$a references as expected, like such:

NotificationComponent.java

    public static final iq4$a n = new iq4$a((qg6) null);
    public final String a;
    public final iq4$c b = new iq4$c(this.h);
    public static final HashMap<String, iq4$b> l = new HashMap<>();
    public static final HashMap<String, iq4$i> m;

Then, with the rest of the mapping filled out for the inner classes they're now back to being recognised as inner classes and get directly referenced NotificationComponent.java

    public static final a n = new a((qg6) null);
    public final String a;
    public final c b = new c(this.h);```
    public static final HashMap<String, iq4.b> l = new HashMap<>();
    public static final HashMap<String, iq4.i> m;

... all except when the reference is inside <> then it's dot referencing the original obfuscated class name.

They also appear to be wrong when it's a reference in a method arg list

public final Object a(iq4.d dVar, iq4.b bVar, boolean z, xe6<? super cd6> xe6)
Lanchon commented 4 years ago

this is by design. your decompiler is flawed. what are you using?

java/dalvik bytecodes have 'signature' items all around. these are supposed to contain generic type info before type erasure happens in the compiler. the signatures are strings and are totally ignored by the VMs (except that reflection APIs can expose them).

so obfuscated code typically is stripped from signatures. again i didn't expect signatures in obfuscated code, or maybe illegal signatures to hinder decompilers.

in short, sigs cannot be trusted. decompilers should only use them if the match reality (they don't in this case) and usage should be toggleable. CFR decompiler makes an effort not to depend on 'optional' metadata, as it could be doctored to trick the decompiler. and if it does, it makes all sorts of sanity checks on the data.

a signature in obfuscated code could spell "yo mama!" and the code would still run fine. so i obviously make no effort to interpret it or map it. the usual use case is that code is dual transformed forth and back, so sigs if required will still be valid.

you'll have to make do with your decompilation, or switch decompiles, or configure your decompiler if possible. this is a wont fix, sorry. :(

Lanchon commented 4 years ago

btw, ALWAYS use set -eu in your bash scripts. use <cmd> || .... for stuff that is ok to fail. and consider pipefail too.

set: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]
    Options:
      -e  Exit immediately if a command exits with a non-zero status.
      -o option-name
          Set the variable corresponding to option-name:
              pipefail     the return value of a pipeline is the status of
                           the last command to exit with a non-zero status,
                           or zero if no command exited with a non-zero status
      -u  Treat unset variables as an error when substituting.
andrewleech commented 4 years ago

Ah that's quite interesting. It does make sense that the <blah> parts are just metadata really. The inconsistent typing in the function arguments is a different problem though surely. I'm happy to fix them up in code (search-replace) I was just worried that they then wouldn't get re-mapped back the other way properly during compiling.

My go-to decompiler is jadx, in my use-cases with the latest versions of jadx it fails on functions/classes the least and makes some of the neatest code compared to the others.

To compare, I fed the same processed classes.dex file into bytecode-viewer (https://github.com/Konloch/bytecode-viewer)

Jadx


    /* JADX WARNING: Removed duplicated region for block: B:15:0x00f2  */
    /* JADX WARNING: Removed duplicated region for block: B:7:0x0021  */
    public final /* synthetic */ java.lang.Object b(iq4.d r17, iq4.b r18, boolean r19, xe6<? super cd6> r20) {

Fernflower

   // $FF: synthetic method
   public final Object b(NotificationMonitor var1, NotificationComponent.b var2, boolean var3, xe6 var4) {

CFR:

    /*
     * Unable to fully structure code
     */
    public final /* synthetic */ Object b(iq4.d var1_1, iq4.b var2_2, boolean var3_3, xe6<? super cd6> var4_4) {

Procyon

    public final /* synthetic */ Object b(final iq4$d iq4$d, final iq4$b l$2, final boolean z$0, final xe6<? super cd6> xe6) {

Krakatau:

    final public Object b(NotificationComponent$NotificationMonitor a0, NotificationComponent$b a1, boolean b0, xe6 a2) {

So Jadx, CFR and Procyon all get the function args wrong, but Fernflower and Krakatau get them right. Would the fact it's flagged as being synthetic be affecting the way it's handled?

ps. thanks for the tip about the set -u flag, I didn't know that one. In this case my script is all in python rather than shell (my same script that then parses and adds @Dex... flags in all the right places to the decompiled code so it's ready to pull in chunks verbatim)

Lanchon commented 4 years ago

Would the fact it's flagged as being synthetic be affecting the way it's handled?

no

maybe that CFR is old. but surely it has a command line arg to disable processing of signatures.

andrewleech commented 4 years ago

I've just re-tested with the latest version of CFR, it gives the same results

public final class NotificationComponent {
    public static final HashMap<String, iq4.b> l;
    public static final HashMap<String, iq4.i> m;
    public final /* synthetic */ Object b(iq4.d var1_1, xe6<? super cd6> var2_2) {

There's a very long list of command line options for cfr, but I can't see anything that looks to me like it'd be related to naming/signatures. The only one I thought might be, --usenametable didn't make any difference disabled (true by default)

Lanchon commented 4 years ago

@leibnitz27

hi Lee,

so some software of mine maps (renames) dalvik bytecode names. the output is dedexed (eg: dex2jar), then fed to CFR.

the mapping is to better handling obfuscated code. surprisingly, the obfuscator used on a piece of code does not remove signatures, and in fact it seems to map them "correctly" (meaning it obfuscates less than it could; ie, it could just strip the signatures).

now maybe in java bytecode sigs are required, per-spec at least. so maybe this is why the obfuscator in question did that. but not in dalvik: in dalvik, the string content of signatures is unspecified.

whatever may be the case, my mapping tool does not concern itself with signatures so they become stale. and CFR outputs incorrect generic source code that reflects the stale values of signatures.

check out a comparison between decompilers here: https://github.com/DexPatcher/dexpatcher-tool/issues/33#issuecomment-610121928

(in this comparison, maybe the decompilers that get things right are the ones that do not parse signatures at all.)

i was under the impression that CFR tried to use code and not metadata as much as possible to decompile. what would be the recommended way to make CFR behave better in this scenario? maybe disabling generics processing? is there an option with which CFR tries to regenerate erased types based on actual casts observed in code?

thanks for your help!

leibnitz27 commented 4 years ago

Hey -

Interesting - there's some places where I trust the signature if it passes basic verification checks - do you have a small case that reproduces this you can provide? Probably possible to spot a more aggressive verification check I can add.

While there are a billion and one flags, they're almost always set automatically, and I don't want to have to rely on getting people to set the right ones if I can infer it, otherwise the amount of effort to use shoots up.

On Tue, Apr 7, 2020 at 5:40 AM Lanchon notifications@github.com wrote:

@leibnitz27 https://github.com/leibnitz27

hi Lee,

so some software of mine maps (renames) dalvik bytecode names. the output is dedexed (eg: dex2jar), then fed to CFR.

the mapping is to better handling obfuscated code. surprisingly, the obfuscator used on a piece of code does not remove signatures, and in fact it seems to map them "correctly" (meaning it obfuscates less than it could; ie, it could just strip the signatures).

now maybe in java bytecode sigs are required, per-spec at least. so maybe this is why the obfuscator in question did that. but not in dalvik: in dalvik, the string content of signatures is unspecified.

whatever may be the case, my mapping tool does not concern itself with signatures so they become stale. and CFR outputs incorrect generic source code that reflects the stale values of signatures.

check out a comparison between decompilers here:

33 (comment)

https://github.com/DexPatcher/dexpatcher-tool/issues/33#issuecomment-610121928

(in this comparison, maybe the decompilers that get things right are the ones that do not parse signatures at all.)

i was under the impression that CFR tried to use code and not metadata as much as possible to decompile. what would be the recommended way to make CFR behave better in this scenario? maybe disabling generics processing? is there an option with which CFR tries to regenerate erased types based on actual casts observed in code?

thanks for your help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DexPatcher/dexpatcher-tool/issues/33#issuecomment-610169079, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFXCEHXLG7IG7VPO5K32DLRLKVB7ANCNFSM4MCEZTEQ .

andrewleech commented 4 years ago

The single class referenced in the example/comparison above can be found here: https://mega.nz/file/485BDS4J#iDsvgTrWVuBX9CtmQsHWPa4m9vCaiTnIHoKiFufvrbM

It was originally named iq4 but was renamed to DianaNotificationComponent as above. One of its inner classes which was renamed from d to NotificationMonitor is the only other thing in the jar.

Lanchon commented 4 years ago

(Lee, please note that Andrew's code sample is very probably based on non-free software.)

andrewleech commented 4 years ago

Sorry yes, this one is not open software, hence sharing the minimum possible on an external host. I accept responsibility for this and already host it and related code on a public gitlab repo, so I would presume/expect any liability should stop with me.

Lanchon commented 4 years ago

oh dont worry Andrew. it's just that Lee tends to integrate these bits into his repo as test cases. we can easily create a clean snippet for him if he needs it.

leibnitz27 commented 4 years ago

@Lanchon - https://github.com/leibnitz27/cfr/commit/a986af3fadd1ce5f99feb06fb5d5fe698f8e3fb7

(--usesignatures false will turn them off altogether, however I strongly recommend against doing that, and just let the more aggressive verification do its thing ).

Having said that, I would strongly recommend, if you're rewriting descriptors anyway, to rewrite signatures if you possibly can.

I know it's possible to lie in them (so I'm very happy to add more paranoia to CFR), but there are a few places (other than generics) where signatures and descriptors carry significantly different info , and the signature is preferable.

Anonymous inner classes with captures is a very good example - the descriptors for those are a total mess.

Lanchon commented 4 years ago

thanks! --usesignatures will have to be then.

rewrite signatures if you possibly can

i won't, for various reasons:

andrewleech commented 4 years ago

Hi @Lanchon so this is working pretty well for me now, I've got a my script running dexpatcher manually to handle the anon and map and create a renamed jar which is good for disassembly and inspection.

I'm trying to use the mapped classes in Android Studio and not having as much luck there when trying to build again.

In build.gradle:

        extraArgs.addAll([
                "--main-plan", "Anon[_Level]",
                "--deanon-source", "--reanon-output",
                "--map-source", "--unmap-output",
                "--map", "mapping.txt"
        ])

If I understand correctly, when building through dexpatcher-gradle, it uses build\intermediates\dexpatcher\dedexed-classes in the classpath to fill everything needed to compile.

This jar hasn't been run through dexpatcher-tool though (I believe), so it doesn't know about any of the anon/map changes? This meant that everything in my patch java that's trying to reference mapped classes fails to find them.

To combat this I tried dexpatcherConfig { patchedApplication { importSymbols = false } } in build.gradle and then add a dependency api files('orig/remapped/classes.jar', 'orig/remapped/classes2.jar', 'orig/remapped/classes3.jar') to the anon&mapped class files I got from manually running dexpatcher-tool then dex2jar. This makes Android Studio happy with all its inspections and symbol finding etc.

However, when trying to compile I get

> Task :collectDexDebug

> Task :patchDexDebug FAILED
DexPatcher version 1.8.0-beta1 by Lanchon (https://dexpatcher.github.io/)
info: read 'C:\foo\build\intermediates\dexpatcher\decoded-app'
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': file 'classes.dex': 10672 types
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': file 'classes2.dex': 10099 types
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': file 'classes3.dex': 525 types
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': dex version '035'
stats: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': 21296 types, 220 ms, 10 us/type
info: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug'
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes.dex': 24 types
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes2.dex': 5653 types
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes3.dex': 4297 types
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes4.dex': 11346 types
fatal: exception:
lanchon.multidexlib2.DuplicateTypeException: Lcom/portfolio/platform/uirenew/alarm/AlarmActivity;
    at lanchon.multidexlib2.MultiDexContainerBackedDexFile.<init>(MultiDexContainerBackedDexFile.java:42)
    at lanchon.multidexlib2.MultiDexIO.readDexFile(MultiDexIO.java:34)
    at lanchon.dexpatcher.Processor.readDex(Processor.java:299)
    at lanchon.dexpatcher.Processor.processFiles(Processor.java:134)
    at lanchon.dexpatcher.Processor.processFiles(Processor.java:80)
    at lanchon.dexpatcher.Main.runWithExceptions(Main.java:71)
    at lanchon.dexpatcher.Main.run(Main.java:49)
    at lanchon.dexpatcher.Main.run(Main.java:44)
    at lanchon.dexpatcher.Main.runWithUsage(Main.java:39)
    at lanchon.dexpatcher.Main.main(Main.java:30)

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':patchDexDebug'.
> Process 'command 'C:\Program Files\Android\Android Studio\jre\bin\java.exe'' finished with non-zero exit value 3

It's complaining about a duplicate type: Lcom/portfolio/platform/uirenew/alarm/AlarmActivity

This is one that's in my patches java source, but hasn't been mapped itself. This class is extending a mapped type though public final class AlarmActivity extends BaseActivity was originally public final class AlarmActivity extends wq4 It doesn't matter if I declare my patch class as @DexEdit or @DexIgnore I get the same error either way.

Does this make sense to you? Do you know of a quick-fix way of getting the dexpatcher mapped class patches to compile in android studio?

Lanchon commented 4 years ago

im really sorry you have to go though all this. of course dxp-gradle should be updated to do all this and more.

btw you are getting pretty good at this. :) the problem is you are instructing gradle to import the symbols (and code) into your patch. so the patch has all the code from the original apk, plus your patch.

in the log, the multidex read that starts here: info: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug' should end with a stats line. but doesnt because the multidex read fails. this is multidexlib2's exception, not really dxp, but thats my child too. because dexlib2 doesn't throw, you know each dex file is well formed. but multidexlib2 refuses to merge all the dex files because there is at least a class defined multiple times (in different dex files).

this is happening to all the classes you are patching of course: the originals are also in the multidex. but i/o errors and other low-level stuff out of dxp's control cause nstant aborts (while dxp tries to continue as much as possible in the face of errors, as you already know).

solution: import the symbols in gradle via a different scope (configuration). i think the right one is called compileOnly?

Lanchon commented 4 years ago

btw it's cool that you've discovered and early adopted this stuff before i published it. but yeah, dxp-gradle is requires an overhaul. i'm no longer supporting old android plugins and gradle versions though. the amount of hacks and dynamic code needed is outrageous. so the current dxp-g is the last to support AAPT1. it's a drag, but i can't work against google deprecations.

andrewleech commented 4 years ago

Hah, yeah I "should" have ample time and motivation to finish all my projects already too ;-) please, no apology necessary for open source/free work you haven't done yet! The wip I'm playing with is hugely beneficial even as is!

Yep I reckon you're right. I didn't know about compileOnly I thought that basically is what api was already, as opposed to the more common implementation - I thought those two were the only options there. I tried with compileOnly and it appears to build happily. I'll continue my testing/patching tomorrow :-)

And yeah, I certainly wouldn't put in any more than bare minimum trying to maintain support for deprecated features/versions. I have a similar battle with an Android app I maintain (with the help of DexPatcher). About 3/4 of my development effort goes into compatibility with the latest API and security changes Google device to enforce; none of this effort is seen or really respected by end users but you kind of just have to suck it up as a developer these days.

Lanchon commented 4 years ago

implementation and api both compile against and bundle the dependency in the build. the difference only shows when a 3rd piece of code uses your code as a dep: api provides your dep to the compile step of the 3rd code (as is part of your api), while implementation doesnt (no leaking of transitive deps). in both cases your dep is added to the build (as is used by your code).

example: your dep: a matrix math lib.

on the other hand, compileOnly is compiled against but never included in builds. what about its transitivity? for reasons i ignore, there is no api/imp distinction for compileOnly deps. i suppose they are never transitive, IDK.