dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

consider dropping most of symbol information (uris and names) when using dwarf stacks #38759

Open mraleph opened 5 years ago

mraleph commented 5 years ago

Usually we recommend using obfuscation as a way to reduce code size because it shrinks identifiers (method names and library uri-s). The size drop is rather noticeable on large applications.

We could consider taking this one step further - most of the symbol information is completely unnecessary when running in AOT mode with dwarf-stack traces, so we could consider replacing completely dropping library uri-s and replacing string based symbol names with global numbering (represented as Smi-s).

This might allow to remove significant number of strings from AOT snapshot and thus shrink it.

/cc @mkustermann

devoncarew commented 3 years ago

@mraleph, @mkustermann, @a-siva - do you have a sense for how much this might help a w/ memory usage of a typical ~large sized Flutter app?

devoncarew commented 3 years ago

(esp. in light that the Dart heap isn't generally the majority of a Flutter app's RSS)

dnfield commented 2 years ago

For customer: money, this could reduce the snapshot size by as much as 9 megabytes uncompressed.

mraleph commented 2 years ago

@dnfield are you running with DWARF stack traces & looking at stripped .so file? I am surprised so much strings are left. Unstripped SO would contain all kinds of nonsense including debug info

dnfield commented 2 years ago

You are right. I had dwarf stack traces but forgot to strip.

With with a stripped dwarf stack trace app.so, it goes from ~2.8 to ~1.75MB of strings using obfuscation. It's not clear to me how much more would be saved by using numeric identifiers beyond what obfuscation can do here.

mraleph commented 2 years ago

According to heapsnapshot of a large application there are actually 5Mb of strings with the following ownership breakdown:

total: 5083628
Instance.@28: 60 0.00%
Instance.@32: 272 0.01%
Instance.@64: 365 0.01%
Instance.@44: 383 0.01%
Instance.@48: 383 0.01%
Instance.@52: 400 0.01%
Instance.@24: 462 0.01%
UnlinkedCall.target_name_: 529 0.01%
Instance.@20: 1578 0.03%
Library.name_: 1748 0.03%
Instance.@92: 5499 0.11%
Instance.@12: 6053 0.12%
Script.url_: 16903 0.33%
Instance.@8: 17551 0.35%
Field.name_: 95561 1.88%
Library.private_key_: 96778 1.90%
Instance.@16: 131513 2.59%
Function.name_: 230479 4.53%
Class.name_: 391934 7.71%
%shared: 1937408 38.11%
Array: 2147769 42.25%

So there is maybe 15% of strings originating from program structure. (maybe slightly more because some of these are also in %shared section).