Open zerny opened 2 years ago
I'm not sure I really understand how this new shared debug info works. Do you have an example dex file that can be shared, or more information about how that works?
If I recall correctly, other items that are shared between multiple entities have their size split up evenly among the entities. So we could split up the size of the shared debug info item between all of the methods that reference it.
D8 has been sharing the info since it was introduced. It happens mostly for release builds as there is a higher chance of the info being equal when only lines on throwing instructions remain and no locals. However it can be shared if equal in any D8/R8 build. Likely candidates are small methods such as default constructors. If they happen to be on the same line in the files then the debug info will coincide and be shared. This has a non-trivial size saving on larger apps.
The new feature in R8 is to use an identity encoding which maps each pc to a line of the same value. With that all methods can share the same debug info item so long as their max instruction pc is within the encoded range in the debug info item event stream (and the caveat about matching parameter count). The pc-based encoding works for the R8 compiler because it will also produce a mapping file which can be used to retrace to original lines/stacktrace.
Attached is an example zip using this encoding by R8. The source of the code is:
class SharedPcEncodedDebugInfo {
public static void m1() {
System.out.print("m");
System.out.println("1");
}
public static void m2() {
System.out.print("m");
System.out.print("2");
System.out.println();
}
public static void m3() {
PrintStream out = System.out;
out.println("m3");
}
public static void main(String[] args) {
m1();
m2();
m3();
}
}
Below is the disassemble output from R8. In it you can see that only two debug info items are created. One for 0 params which is shared by m1, m2 and m3. It encodes +1+1 pc-line delta events up to pc 0x13; and one for 1 param which is used by the main method.
$ ./tools/disasm.py shared-pc-debug-info.zip
<snip>
Number of markers: 1
~~R8{"backend":"dex","compilation-mode":"release","has-checksums":false,"min-api":1,"pg-map-id":"c73f034","r8-mode":"full","sha-1":"engineering","version":"main"}
# Bytecode for
# Class: 'com.android.tools.r8.SharedPcEncodedDebugInfo'
#
# Method: '<init>':
#
#
void com.android.tools.r8.SharedPcEncodedDebugInfo.<init>()
registers: 1, inputs: 1, outputs: 1
------------------------------------------------------------
inst# offset instruction arguments
------------------------------------------------------------
0: 0x00: InvokeDirect { v0 } Ljava/lang/Object;-><init>()V
1: 0x03: ReturnVoid
#
# Method: 'm1':
# public static
#
void com.android.tools.r8.SharedPcEncodedDebugInfo.m1()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst# offset instruction arguments
------------------------------------------------------------
0: 0x00: SgetObject v0, Field java.io.PrintStream java.lang.System.out
1: 0x02: ConstString v1, "m"
2: 0x04: InvokeVirtual { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
3: 0x07: SgetObject v0, Field java.io.PrintStream java.lang.System.out
4: 0x09: ConstString v1, "1"
5: 0x0b: InvokeVirtual { v0 v1 } Ljava/io/PrintStream;->println(Ljava/lang/String;)V
6: 0x0e: ReturnVoid
PcBasedDebugInfo (params: 0, max-pc: 0x13)
#
# Method: 'm2':
# public static
#
void com.android.tools.r8.SharedPcEncodedDebugInfo.m2()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst# offset instruction arguments
------------------------------------------------------------
0: 0x00: SgetObject v0, Field java.io.PrintStream java.lang.System.out
1: 0x02: ConstString v1, "m"
2: 0x04: InvokeVirtual { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
3: 0x07: SgetObject v0, Field java.io.PrintStream java.lang.System.out
4: 0x09: ConstString v1, "2"
5: 0x0b: InvokeVirtual { v0 v1 } Ljava/io/PrintStream;->print(Ljava/lang/String;)V
6: 0x0e: SgetObject v0, Field java.io.PrintStream java.lang.System.out
7: 0x10: InvokeVirtual { v0 } Ljava/io/PrintStream;->println()V
8: 0x13: ReturnVoid
PcBasedDebugInfo (params: 0, max-pc: 0x13)
#
# Method: 'm3':
# public static
#
void com.android.tools.r8.SharedPcEncodedDebugInfo.m3()
registers: 2, inputs: 0, outputs: 2
------------------------------------------------------------
inst# offset instruction arguments
------------------------------------------------------------
0: 0x00: SgetObject v0, Field java.io.PrintStream java.lang.System.out
1: 0x02: ConstString v1, "m3"
2: 0x04: InvokeVirtual { v0 v1 } Ljava/io/PrintStream;->println(Ljava/lang/String;)V
3: 0x07: ReturnVoid
PcBasedDebugInfo (params: 0, max-pc: 0x13)
#
# Method: 'main':
# public static
#
void com.android.tools.r8.SharedPcEncodedDebugInfo.main(java.lang.String[])
registers: 1, inputs: 1, outputs: 0
------------------------------------------------------------
inst# offset instruction arguments
------------------------------------------------------------
0: 0x00: InvokeStatic { } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m1()V
1: 0x03: InvokeStatic { } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m2()V
2: 0x06: InvokeStatic { } Lcom/android/tools/r8/SharedPcEncodedDebugInfo;->m3()V
3: 0x09: ReturnVoid
PcBasedDebugInfo (params: 1, max-pc: 0x09)
Regarding the "fix suggestion 2." in comment 1, the code to prune the event stream for the current method can be found for dexdump here: https://android-review.googlesource.com/c/platform/art/+/1967643
Regarding splitting the shared size, do you have a pointer to that being done for a similar component in the code base?
Let me know if the proposed changes in the pull request #844 are acceptable or if you have any comments regarding them.
Thanks, Ian
Just a friendly ping
Bump
The DEX format allows sharing the same debug_info_item by several methods / code_items. The is used by D8 to canonicalize debug info and reduce the size the DEX files and has been in place since the introduction of D8. R8 and other tools make further use of this to share a small set of very large debug_info_item for almost all methods in the program.
The shared use of debug_info_item is within the existing specification of the format and has been tested to work on all VMs back to 4.0.4. There is one requirements to avoid some issues on legacy VMs, namely that the number of parameters in the debug_info_item matches the parameter count of all methods referencing it.
A suggestion to support these cases would be to:
If 1. is not an acceptable solution, the fix for 2. and 3. will mitigate the double accounting by at least only attributing the (shared) contribution from the given method.
Item 3. is inline with the debugging behavior on runtimes where it is not possible to break on lines with no pc change. The D8 compiler will explicitly ensure that a nop is always inserted if ever there are two lines with no intermediate instructions.
A final possible feature would be to support canonicalizing the debug information again when writing to DEX with smali. I think for pipelines needing this is is likely best to do a subsequent run of D8, as that tool will also support translating mapping files such that the output can again use the highly compressed representation of debug info.