JesusFreke / smali

smali/baksmali
6.3k stars 1.07k forks source link

Dex file size increases by ~50% without making changes (DexFileFactory.loadDexFile then DexFileFactory.writeDexFile) #872

Open marwan-bushara opened 1 year ago

marwan-bushara commented 1 year ago

I'm using "DexFileFactory" class to read and write dex files. A simple example:

DexFile dexFile = DexFileFactory.loadDexFile(inputPath, null); DexFileFactory.writeDexFile(outputPath, dexFile

Just by loading the dex file and rewriting it (without any modifications) the size of the dex files increase by sometimes up to 50%. What are the reasons for this increase in size and are there any ways to decrease the file’s size when writing it?

CunningLogic commented 1 year ago

post the before/after dex and ill take a look

marwan-bushara commented 1 year ago

I'm attaching one example here. The size increase is about 40% in this case. classes.zip

katzdan commented 1 year ago

Hi, I'm following this issue as well. Not sure what you mean by "modern optimizing compiler". Is there a way to reduce the changes in size between the initial dex file and the one written? Thanks

CunningLogic commented 1 year ago

@katzdan after I get the brisket going on the smoker, I will post a detailed explanation. I deleted my far too simple reply (im just waking up and was being a bit lazy). I do think there is something else going on here too, but I need to run them through my disassembler to look closer.

but yes, there is a lot of duplicate data in a dex file, and a lot of data is pointed to with pointers/offsets.

For example, if two classes have identical debug sections, both can point to the same data for their debug section.

CunningLogic commented 1 year ago

@katzdan @marwan-bushara

The after file is 3322156 bytes longer the debug section accounts for 46276 of those bytes

TLDR: at least a good chunk of the difference is because smali does not employ some of the space saving tricks that some compilers do, by not writing duplicate data, and by pointing multiple references at the same data

katzdan commented 1 year ago

Is there any hope to get a reduction in the size of a Dex in the future? as an optimization feature?

CunningLogic commented 1 year ago

@katzdan there are a number of ways to do that, it would be fairly easy.

Easiest route would be to just write a script to parse the smali files and remove debug information and dead code