CalebFenton / simplify

Android virtual machine and deobfuscator
Other
4.45k stars 438 forks source link

Emulate Charset encoding and decoding #157

Open CalebFenton opened 3 years ago

CalebFenton commented 3 years ago

There's a significant difference between JVM and Android implementations of java.nio.Charset. For example, calling java.nio.Charset.forName("UTF-8") on JVM returns an object of class sun.nio.cs.UTF-8 but Android returns an object of class java.nio.charset.CharsetICU. Since sun.nio.cs.UTF-8 doesn't exist on Android, there's an error when the class manager tries to find the Smali file for it in the reference framework.

If this were the only difference, it would be easy enough to implement. All you'd need is the correct constructor arguments for CharsetICU, with the most "difficult" being aliases, but these are easily obtained from a table here: https://icu4c-demos.unicode.org/icu-bin/convexp?s=WINDOWS&s=JAVA&s=IANA&s=MIME Note: Windows, Java, IANA, and MIME seem to be what are used on Android (reference: https://cs.android.com/android/platform/superproject/+/master:external/icu/android_icu4j/libcore_bridge/src/native/com_android_icu_charset_NativeConverter.cpp;drc=master;bpv=1;bpt=1;l=163?q=nativeconverter)

Unfortunately, encoding and decoding are handled differently between JVM and Android, and it's a lot more complex. Will need time to research and implement emulated functions. It may be possible to "shim" the emulation code to just use JVM encoding / decoding in the background, but there's still a lot of detail work to create test cases, hook all the right functions, etc.