dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.57k forks source link

reduce end-to-end AOT compilation time for large applications #43299

Open mraleph opened 4 years ago

mraleph commented 4 years ago

This is an umbrella issue to track efforts around reduction of critical path when building large Flutter applications in AOT mode

alexmarkov commented 4 years ago

This looks duplicate to https://github.com/dart-lang/sdk/issues/42442.

mraleph commented 4 years ago

@alexmarkov this is more of an umbrella issue, not limited to just TFA but to general end-to-end performance.

dnfield commented 3 years ago

One way to do this would be to support breaking things up into smaller increments of compilation.

This would allow for fanning out compilation steps to many different worker machines, and would also potentially help with incrementally compiling smaller changes between builds.

I'm more interested in faster incremental builds for the sake of performance tuning/debugging - right now, changing a single line of code requires repeating the entire process of kernel compiling/tfa/snapshotting, which can be very expensive on large codebases.

mraleph commented 3 years ago

We could parallelize gen_snapshot step in a relatively straightforward fashion though it would potentially require us to forgo some of the late stage tree shaking (might not be that relevant for code quality after all).

Parallelizing TFA is an open problem - it is intrisically non-modular global step. You should compare it to similar global optimizations steps like LTO in LLVM or ProGuard et al

xster commented 3 years ago

Google concerns: b/203690870

dnfield commented 3 years ago

Perhaps something like what LLVM did with ThinLTO would be applicable for TFA? http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html

quyenvsp commented 2 years ago

Hi,

Please guide me anyway to debug AOT progress? Our bank application build AOT too long.

stuck at step frontend_server.dart.snapshot

debug without AOT: 51 seconds (with 16 Gb RAM)

release with AOT: 51 minutes (with 16 Gb RAM) 31 minutes (with 32 Gb RAM)

But problem is frontend_server.dart.snapshot --verbose dont print anything helpful so we can not know where is problem. Is it possible to print what dart file is processing? So we can see where it stuck.

Thank you.

mraleph commented 2 years ago

@quyenvsp which Flutter version are you on? (could you post flutter doctor -v?).

It is probably hitting some pathological case in the TFA.

Currently the best way to diagnose that requires a bit of manual work - it's described in https://github.com/dart-lang/sdk/issues/47546#issuecomment-955078084

  1. get Dart SDK sources (as explained at dart-lang/sdk/wiki/Building#getting-the-source)
  2. Checkout the version of Dart matching the version of Flutter you are using. I would be able to tell you the version if you post flutter doctor -v output.
  3. Get exact command line arguments for frontend_server.dart.snapshot (e.g. by doing flutter build apk --release -v) and then run it from sources in the following way ($DART is the path to Dart SDK sources):

    # $DART is the path to Dart SDK sources
    # $ARGS are the arguments that are normally passed to frontend_server.dart.snapshot
    $ dart -Dglobal.type.flow.print.stats=true --disable-dart-dev --packages=$DART/.packages $DART/pkg/frontend_server/bin/frontend_server_starter.dart $ARGS

/cc @alexmarkov (we should consider allowing users to set global.type.flow.print.stats=true through OS environment variables, rather than through Dart defines).

quyenvsp commented 2 years ago

@mraleph

Thank so much for guide.

We have 2 bank application project almost similar (one just little smaller) but TFA time is more different 377947ms >< 1997843ms (~5 time slower).

Both do same Ubuntu machine (32Gb of ram), using Flutter stable 2.10.5 Dart 2.16.2 (Dart SDK sources), free memory before build each. one project build just waste 8Gb ram, the slow project take 17Gb ram

While we continue find where is problem, hope global.type.flow.print.stats=true result below of 2 project helpful. (Also have perf report but size is 200Mb and 600Mb, will upload if you need)

fine_project_result.txt slow_project_result.txt

We see a huge different 258907308 >< 738198250 invocations queried in cache, but it not detailed yet. Please is these anyway counter per packages/classes? Something like package/class A 1000 invocations, package/class B 2000 invocations,...By this way we easy know where is problem then find more on it.

alexmarkov commented 2 years ago

@quyenvsp Thank you for the report. These huge compilation times are usually caused by a large amount of auto-generated Dart source code. Do you use Dart source auto-generation in your app?

By looking at the logs, it seems like the second app is ~3x times larger (10676547 vs. 28089433 summaries analyzed). That may explain the difference in the compilation times. So far I do not see anything out of the ordinary from those logs, except maybe that in both apps analysis hits the hard limit of 5000 maximum invocations cached per selector. You can try to vary (decrease) constants in the _SelectorApproximation class to see if it helps:

https://github.com/dart-lang/sdk/blob/17bd00df50ad3ddfee4dcea69d6de5bae41aba58/pkg/vm/lib/transformations/type_flow/analysis.dart#L762-L775

Also, you can try to run frontend server with -Dglobal.type.flow.print.timings=true option (as @mraleph explained above) to get a list of methods where analysis spent the most time. This option should print 3 tables of methods - Top summaries by number of times analyzed, Top summaries by dirty analysis time (including callees), in microseconds and Top summaries by pure analysis time (excluding callees), in microseconds. The most interesting are the last and the first tables.

yaminet1024 commented 1 year ago

hi, I'm facing the same issue, and most of my time is spent on the gen_snapshot, which takes nearly 8 minutes. 2023-10-24 16:31:32:218 : [ +1 ms] executing: gen_snapshot_arm64 --deterministic --save-obfuscation-map=build/ios/framework/Release/app_iOS_symbols.json --snapshot_kind=app-aot-assembly --assembly=/Volumes/data/workspace/.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.S --dwarf-stack-traces --resolve-dwarf-paths --save-debugging-info=build/ios/framework/Release/app.ios-arm64.symbols --obfuscate /Volumes/data/workspace/.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/app.dill 2023-10-24 16:33:47:397 : ##[error][+135179 ms] Warning: The generated assembly code contains unobfuscated DWARF debugging information. 2023-10-24 16:33:47:398 : ##[error][ ] To avoid this, use --strip to remove it. 2023-10-24 16:33:47:646 : [ +247 ms] executing: xcrun cc -arch arm64 -miphoneos-version-min=11.0 -isysroot /Applications/Xcode_14.3.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.4.sdk -c /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.S -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.o 2023-10-24 16:38:04:903 : [+257257 ms] executing: xcrun clang -arch arm64 -miphoneos-version-min=11.0 -isysroot /Applications/Xcode_14.3.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.4.sdk -dynamiclib -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -fapplication-extension -install_name @rpath/App.framework/App -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.o 2023-10-24 16:38:05:763 : [ +859 ms] executing: xcrun dsymutil -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework.dSYM /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App 2023-10-24 16:38:08:158 : [+2395 ms] executing: xcrun strip -x /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App 2023-10-24 16:38:09:312 : [+1153 ms] aot_assembly_release: Complete Are there any good way for outputting trace information, so I can better identify where the longest delays occur? Also, does gen_snapshot currently support incremental compilation? If not, are there any other methods to improve the compilation speed? Currently, our project takes close to 30 minutes to compile. Thank you.

mraleph commented 1 year ago

@yaminet1024 how big is your code? You can try to run gen_snapshot_arm64 with --print_precompiler_timings and paste the output. You can also run gen_snapshot_arm64 with --trace-compiler, redirect it to 2>/tmp/compilation_trace.txt and then process the log using the following command

$ cat /tmp/compilation_trace.txt | perl -n -e 'm/--> \'([^\']+)\'.*time: (\d+)/ && do{$n=$1;$t=$2; $n =~ s/ /_/g; print ($n, " ", $t, "\n")}' | sort -n -k 2 > /tmp/sorted.txt

You can look at the methods that take longest to compile. If there are some outliers - we would like to know if there is something special about them (e.g. huge autogenerated code is one common suspect).

It would be nice to know histogram of compilation timings, how many methods are compiled in total (wc -l /tmp/sorted.txt should give that).

michael-golfi commented 4 months ago

After some searching I believe this is the right issue to comment on... I am seeing large compile times for a Flutter app using quite a bit of generated serializer code. I'm not sure what is considered big in terms of an app.

I am seeing 25 minute release builds, here is output from verbose logs:

[  +86 ms] gen_dart_plugin_registrant: Complete
[   +7 ms] kernel_snapshot: Starting due to {}
[  +10 ms] Embedding native assets mapping /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/native_assets.yaml in kernel.
[   +6 ms] /Users/michaelgolfi/development/flutter/bin/cache/dart-sdk/bin/dartaotruntime --disable-dart-dev /Users/michaelgolfi/development/flutter/bin/cache/dart-sdk/bin/snapshots/frontend_server_aot.dart.snapshot --sdk-root /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/common/flutter_patched_sdk_product/ --target=flutter --no-print-incremental-dependencies -Ddart.vm.profile=false -Ddart.vm.product=true --delete-tostring-package-uri=dart:ui --delete-tostring-package-uri=package:flutter --aot --tfa --target-os ios --packages /Users/michaelgolfi/work/mobile/.dart_tool/package_config.json --output-dill /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill --depfile /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/kernel_snapshot.d --source file:///Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/dart_plugin_registrant.dart --source package:flutter/src/dart_plugin_registrant.dart -Dflutter.dart_plugin_registrant=file:///Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/dart_plugin_registrant.dart --native-assets /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/native_assets.yaml --verbosity=error package:phyla/main.dart
[+1096958 ms] kernel_snapshot: Complete
[+1782 ms] aot_assembly_release: Starting due to {InvalidatedReasonKind.inputChanged: The following inputs have updated contents: /Users/michaelgolfi/development/flutter/packages/flutter_tools/lib/src/build_system/targets/ios.dart,/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill,/Users/michaelgolfi/development/flutter/bin/internal/engine.version,/Users/michaelgolfi/development/flutter/bin/internal/engine.version}
[   +1 ms] targetingApplePlatform = true
[        ] extractAppleDebugSymbols = true
[        ] Will strip AOT snapshot manually after build and dSYM generation.
[        ] executing: /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 --deterministic --snapshot_kind=app-aot-assembly --assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill
[+59343 ms] executing: sysctl hw.optional.arm64
[   +3 ms] Exit code 0 from: sysctl hw.optional.arm64
[        ] hw.optional.arm64: 1
[        ] executing: /usr/bin/arch -arm64e xcrun cc -arch arm64 -miphoneos-version-min=12.0 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -c /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.o
[+105579 ms] executing: /usr/bin/arch -arm64e xcrun clang -arch arm64 -miphoneos-version-min=12.0 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -dynamiclib -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -fapplication-extension -install_name @rpath/App.framework/App -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.o
[ +100 ms] executing: /usr/bin/arch -arm64e xcrun dsymutil -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework.dSYM /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App
[ +751 ms] executing: /usr/bin/arch -arm64e xcrun strip -x /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App
[ +124 ms] aot_assembly_release: Complete
[        ] release_ios_bundle_flutter_assets: Starting due to {}

@mraleph I will try your suggestions above. So far I have tried to move most of the serializers into their own path dependency in the past, in the hopes it would cache the builds since they change less often than the main app.

michael-golfi commented 4 months ago

I found that much time was spent on serializers generated from ferry_generator. I managed to reduce some of the generated code with some config and shaved 500s from release build time, an improvement of about 40%.

The shell script above has a missing quote somewhere. I gave up on it and wrote some Python to sort the files by time instead.

"""
  /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 \
  --print_precompiler_timings \
  --trace-compiler \
  --deterministic \
  --snapshot_kind=app-aot-assembly \
  --assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S \
    /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill 2>compilation_trace.txt
"""

def process_file(input_file, output_file):
    data = []

    with open(input_file, "r") as file:
        for line in file:
            if "-->" in line:
                parts = line.strip().split("time:")
                if len(parts) > 1:
                    time_part = (
                        parts[1].strip().split(" ")[0]
                    )  # Extract the time before 'us'
                    name_part = parts[0].split("'")[1]  # Extract the function name
                    name_part = name_part.replace(
                        " ", "_"
                    )  # Replace spaces with underscores
                    data.append((name_part, int(time_part)))

    # Sort data by the time, which is the second item in the tuple
    data.sort(key=lambda x: x[1])

    with open(output_file, "w") as file:
        for name, time in data:
            file.write(f"{name} {time}\n")

# Usage
process_file("compilation_trace.txt", "sorted.txt")
mraleph commented 4 months ago

The ferry_generator generated code does seem to hit some sort of non-linearity in TFA.

I extracted a benchmark core from another issue I was looking at before (related to build_runner performance): https://github.com/mraleph/flutter_ferry_aot_stress_test. I can clearly see that as input grows TFA speed (input size divided by time taken) decreases: 0.49 kb/ms, then 0.25 kb/ms, then 0.11 kb/ms

@alexmarkov could you take this for a spin?

michael-golfi commented 4 months ago

If helpful, I posted a repro on a build_runner issue a couple years ago.