JesusFreke / smali

smali/baksmali
6.3k stars 1.07k forks source link

Improve support for shared debug info items in DEX files #843

Open zerny opened 2 years ago

zerny commented 2 years ago

The DEX format allows sharing the same debug_info_item by several methods / code_items. The is used by D8 to canonicalize debug info and reduce the size the DEX files and has been in place since the introduction of D8. R8 and other tools make further use of this to share a small set of very large debug_info_item for almost all methods in the program.

The shared use of debug_info_item is within the existing specification of the format and has been tested to work on all VMs back to 4.0.4. There is one requirements to avoid some issues on legacy VMs, namely that the number of parameters in the debug_info_item matches the parameter count of all methods referencing it.

A suggestion to support these cases would be to:

  1. Don't count the size of the debug_info_item towards the size of a method.
  2. Don't represent debug events for addresses passed the instruction offset of the last instruction in a method referencing it.
  3. Don't represent debug line entries with no instruction offset change.

If 1. is not an acceptable solution, the fix for 2. and 3. will mitigate the double accounting by at least only attributing the (shared) contribution from the given method.

Item 3. is inline with the debugging behavior on runtimes where it is not possible to break on lines with no pc change. The D8 compiler will explicitly ensure that a nop is always inserted if ever there are two lines with no intermediate instructions.

A final possible feature would be to support canonicalizing the debug information again when writing to DEX with smali. I think for pipelines needing this is is likely best to do a subsequent run of D8, as that tool will also support translating mapping files such that the output can again use the highly compressed representation of debug info.

JesusFreke commented 2 years ago

I'm not sure I really understand how this new shared debug info works. Do you have an example dex file that can be shared, or more information about how that works?

If I recall correctly, other items that are shared between multiple entities have their size split up evenly among the entities. So we could split up the size of the shared debug info item between all of the methods that reference it.

zerny commented 2 years ago

D8 has been sharing the info since it was introduced. It happens mostly for release builds as there is a higher chance of the info being equal when only lines on throwing instructions remain and no locals. However it can be shared if equal in any D8/R8 build. Likely candidates are small methods such as default constructors. If they happen to be on the same line in the files then the debug info will coincide and be shared. This has a non-trivial size saving on larger apps.

The new feature in R8 is to use an identity encoding which maps each pc to a line of the same value. With that all methods can share the same debug info item so long as their max instruction pc is within the encoded range in the debug info item event stream (and the caveat about matching parameter count). The pc-based encoding works for the R8 compiler because it will also produce a mapping file which can be used to retrace to original lines/stacktrace.

Attached is an example zip using this encoding by R8. The source of the code is:

class SharedPcEncodedDebugInfo {

  public static void m1() {
    System.out.print("m");
    System.out.println("1");
  }

  public static void m2() {
    System.out.print("m");
    System.out.print("2");
    System.out.println();
  }

  public static void m3() {
    PrintStream out = System.out;
    out.println("m3");
  }

  public static void main(String[] args) {
    m1();
    m2();
    m3();
  }
}

Below is the disassemble output from R8. In it you can see that only two debug info items are created. One for 0 params which is shared by m1, m2 and m3. It encodes +1+1 pc-line delta events up to pc 0x13; and one for 1 param which is used by the main method.

$ ./tools/disasm.py shared-pc-debug-info.zip 
<snip>
Number of markers: 1
~~R8{"backend":"dex","compilation-mode":"release","has-checksums":false,"min-api":1,"pg-map-id":"c73f034","r8-mode":"full","sha-1":"engineering","version":"main"}
# Bytecode for
# Class: 'com.android.tools.r8.SharedPcEncodedDebugInfo'

#
# Method: '<init>':
# 
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.<init>()
registers: 1, inputs: 1, outputs: 1
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: InvokeDirect        { v0 } Ljava/lang/Object;-><init>()V
    1:   0x03: ReturnVoid          

#
# Method: 'm1':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.m1()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    1:   0x02: ConstString         v1, "m"
    2:   0x04: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
    3:   0x07: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    4:   0x09: ConstString         v1, "1"
    5:   0x0b: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->println(Ljava/lang/String;)V
    6:   0x0e: ReturnVoid          
PcBasedDebugInfo (params: 0, max-pc: 0x13)

#
# Method: 'm2':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.m2()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    1:   0x02: ConstString         v1, "m"
    2:   0x04: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
    3:   0x07: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    4:   0x09: ConstString         v1, "2"
    5:   0x0b: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
    6:   0x0e: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    7:   0x10: InvokeVirtual       { v0 } Ljava/io/PrintStream;->println()V
    8:   0x13: ReturnVoid          
PcBasedDebugInfo (params: 0, max-pc: 0x13)

#
# Method: 'm3':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.m3()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: SgetObject          v0, Field java.io.PrintStream java.lang.System.out
    1:   0x02: ConstString         v1, "m3"
    2:   0x04: InvokeVirtual       { v0 v1 } Ljava/io/PrintStream;->println(Ljava/lang/String;)V
    3:   0x07: ReturnVoid          
PcBasedDebugInfo (params: 0, max-pc: 0x13)

#
# Method: 'main':
# public static
#

void com.android.tools.r8.SharedPcEncodedDebugInfo.main(java.lang.String[])
registers: 1, inputs: 1, outputs: 0
------------------------------------------------------------
inst#  offset  instruction         arguments
------------------------------------------------------------
    0:   0x00: InvokeStatic        {  } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m1()V
    1:   0x03: InvokeStatic        {  } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m2()V
    2:   0x06: InvokeStatic        {  } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m3()V
    3:   0x09: ReturnVoid          
PcBasedDebugInfo (params: 1, max-pc: 0x09)

shared-pc-debug-info.zip

zerny commented 2 years ago

Regarding the "fix suggestion 2." in comment 1, the code to prune the event stream for the current method can be found for dexdump here: https://android-review.googlesource.com/c/platform/art/+/1967643

zerny commented 2 years ago

Regarding splitting the shared size, do you have a pointer to that being done for a similar component in the code base?

zerny commented 2 years ago

Let me know if the proposed changes in the pull request #844 are acceptable or if you have any comments regarding them.

Thanks, Ian

zerny commented 2 years ago

Just a friendly ping

benjaminRomano commented 2 years ago

Bump