Open mraleph opened 4 years ago
This looks duplicate to https://github.com/dart-lang/sdk/issues/42442.
@alexmarkov this is more of an umbrella issue, not limited to just TFA but to general end-to-end performance.
One way to do this would be to support breaking things up into smaller increments of compilation.
This would allow for fanning out compilation steps to many different worker machines, and would also potentially help with incrementally compiling smaller changes between builds.
I'm more interested in faster incremental builds for the sake of performance tuning/debugging - right now, changing a single line of code requires repeating the entire process of kernel compiling/tfa/snapshotting, which can be very expensive on large codebases.
We could parallelize gen_snapshot
step in a relatively straightforward fashion though it would potentially require us to forgo some of the late stage tree shaking (might not be that relevant for code quality after all).
Parallelizing TFA is an open problem - it is intrisically non-modular global step. You should compare it to similar global optimizations steps like LTO in LLVM or ProGuard et al
Google concerns: b/203690870
Perhaps something like what LLVM did with ThinLTO would be applicable for TFA? http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html
Hi,
Please guide me anyway to debug AOT progress? Our bank application build AOT too long.
stuck at step frontend_server.dart.snapshot
debug without AOT: 51 seconds (with 16 Gb RAM)
release with AOT: 51 minutes (with 16 Gb RAM) 31 minutes (with 32 Gb RAM)
But problem is frontend_server.dart.snapshot --verbose dont print anything helpful so we can not know where is problem. Is it possible to print what dart file is processing? So we can see where it stuck.
Thank you.
@quyenvsp which Flutter version are you on? (could you post flutter doctor -v
?).
It is probably hitting some pathological case in the TFA.
Currently the best way to diagnose that requires a bit of manual work - it's described in https://github.com/dart-lang/sdk/issues/47546#issuecomment-955078084
flutter doctor -v
output.Get exact command line arguments for frontend_server.dart.snapshot
(e.g. by doing flutter build apk --release -v
) and then run it from sources in the following way ($DART
is the path to Dart SDK sources):
# $DART is the path to Dart SDK sources
# $ARGS are the arguments that are normally passed to frontend_server.dart.snapshot
$ dart -Dglobal.type.flow.print.stats=true --disable-dart-dev --packages=$DART/.packages $DART/pkg/frontend_server/bin/frontend_server_starter.dart $ARGS
/cc @alexmarkov (we should consider allowing users to set global.type.flow.print.stats=true
through OS environment variables, rather than through Dart defines).
@mraleph
Thank so much for guide.
We have 2 bank application project almost similar (one just little smaller) but TFA time is more different 377947ms >< 1997843ms (~5 time slower).
Both do same Ubuntu machine (32Gb of ram), using Flutter stable 2.10.5 Dart 2.16.2 (Dart SDK sources), free memory before build each. one project build just waste 8Gb ram, the slow project take 17Gb ram
While we continue find where is problem, hope global.type.flow.print.stats=true
result below of 2 project helpful.
(Also have perf report but size is 200Mb and 600Mb, will upload if you need)
fine_project_result.txt slow_project_result.txt
We see a huge different 258907308 >< 738198250 invocations queried in cache, but it not detailed yet. Please is these anyway counter per packages/classes? Something like package/class A 1000 invocations, package/class B 2000 invocations,...By this way we easy know where is problem then find more on it.
@quyenvsp Thank you for the report. These huge compilation times are usually caused by a large amount of auto-generated Dart source code. Do you use Dart source auto-generation in your app?
By looking at the logs, it seems like the second app is ~3x times larger (10676547 vs. 28089433 summaries analyzed). That may explain the difference in the compilation times. So far I do not see anything out of the ordinary from those logs, except maybe that in both apps analysis hits the hard limit of 5000 maximum invocations cached per selector. You can try to vary (decrease) constants in the _SelectorApproximation
class to see if it helps:
Also, you can try to run frontend server with -Dglobal.type.flow.print.timings=true
option (as @mraleph explained above) to get a list of methods where analysis spent the most time. This option should print 3 tables of methods - Top summaries by number of times analyzed
, Top summaries by dirty analysis time (including callees), in microseconds
and Top summaries by pure analysis time (excluding callees), in microseconds
. The most interesting are the last and the first tables.
hi, I'm facing the same issue, and most of my time is spent on the gen_snapshot, which takes nearly 8 minutes.
2023-10-24 16:31:32:218 : [ +1 ms] executing: gen_snapshot_arm64 --deterministic --save-obfuscation-map=build/ios/framework/Release/app_iOS_symbols.json --snapshot_kind=app-aot-assembly --assembly=/Volumes/data/workspace/.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.S --dwarf-stack-traces --resolve-dwarf-paths --save-debugging-info=build/ios/framework/Release/app.ios-arm64.symbols --obfuscate /Volumes/data/workspace/.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/app.dill 2023-10-24 16:33:47:397 : ##[error][+135179 ms] Warning: The generated assembly code contains unobfuscated DWARF debugging information. 2023-10-24 16:33:47:398 : ##[error][ ] To avoid this, use --strip to remove it. 2023-10-24 16:33:47:646 : [ +247 ms] executing: xcrun cc -arch arm64 -miphoneos-version-min=11.0 -isysroot /Applications/Xcode_14.3.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.4.sdk -c /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.S -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.o 2023-10-24 16:38:04:903 : [+257257 ms] executing: xcrun clang -arch arm64 -miphoneos-version-min=11.0 -isysroot /Applications/Xcode_14.3.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS16.4.sdk -dynamiclib -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -fapplication-extension -install_name @rpath/App.framework/App -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/snapshot_assembly.o 2023-10-24 16:38:05:763 : [ +859 ms] executing: xcrun dsymutil -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework.dSYM /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App 2023-10-24 16:38:08:158 : [+2395 ms] executing: xcrun strip -x /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App -o /Volumes/data/workspace/ /.dart_tool/flutter_build/206753438bffcc2db0ccae02ede2c598/arm64/App.framework/App 2023-10-24 16:38:09:312 : [+1153 ms] aot_assembly_release: Complete
Are there any good way for outputting trace information, so I can better identify where the longest delays occur? Also, does gen_snapshot currently support incremental compilation? If not, are there any other methods to improve the compilation speed? Currently, our project takes close to 30 minutes to compile. Thank you.
@yaminet1024 how big is your code? You can try to run gen_snapshot_arm64
with --print_precompiler_timings
and paste the output. You can also run gen_snapshot_arm64
with --trace-compiler
, redirect it to 2>/tmp/compilation_trace.txt
and then process the log using the following command
$ cat /tmp/compilation_trace.txt | perl -n -e 'm/--> \'([^\']+)\'.*time: (\d+)/ && do{$n=$1;$t=$2; $n =~ s/ /_/g; print ($n, " ", $t, "\n")}' | sort -n -k 2 > /tmp/sorted.txt
You can look at the methods that take longest to compile. If there are some outliers - we would like to know if there is something special about them (e.g. huge autogenerated code is one common suspect).
It would be nice to know histogram of compilation timings, how many methods are compiled in total (wc -l /tmp/sorted.txt
should give that).
After some searching I believe this is the right issue to comment on... I am seeing large compile times for a Flutter app using quite a bit of generated serializer code. I'm not sure what is considered big in terms of an app.
I am seeing 25 minute release builds, here is output from verbose logs:
[ +86 ms] gen_dart_plugin_registrant: Complete
[ +7 ms] kernel_snapshot: Starting due to {}
[ +10 ms] Embedding native assets mapping /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/native_assets.yaml in kernel.
[ +6 ms] /Users/michaelgolfi/development/flutter/bin/cache/dart-sdk/bin/dartaotruntime --disable-dart-dev /Users/michaelgolfi/development/flutter/bin/cache/dart-sdk/bin/snapshots/frontend_server_aot.dart.snapshot --sdk-root /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/common/flutter_patched_sdk_product/ --target=flutter --no-print-incremental-dependencies -Ddart.vm.profile=false -Ddart.vm.product=true --delete-tostring-package-uri=dart:ui --delete-tostring-package-uri=package:flutter --aot --tfa --target-os ios --packages /Users/michaelgolfi/work/mobile/.dart_tool/package_config.json --output-dill /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill --depfile /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/kernel_snapshot.d --source file:///Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/dart_plugin_registrant.dart --source package:flutter/src/dart_plugin_registrant.dart -Dflutter.dart_plugin_registrant=file:///Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/dart_plugin_registrant.dart --native-assets /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/native_assets.yaml --verbosity=error package:phyla/main.dart
[+1096958 ms] kernel_snapshot: Complete
[+1782 ms] aot_assembly_release: Starting due to {InvalidatedReasonKind.inputChanged: The following inputs have updated contents: /Users/michaelgolfi/development/flutter/packages/flutter_tools/lib/src/build_system/targets/ios.dart,/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill,/Users/michaelgolfi/development/flutter/bin/internal/engine.version,/Users/michaelgolfi/development/flutter/bin/internal/engine.version}
[ +1 ms] targetingApplePlatform = true
[ ] extractAppleDebugSymbols = true
[ ] Will strip AOT snapshot manually after build and dSYM generation.
[ ] executing: /Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 --deterministic --snapshot_kind=app-aot-assembly --assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill
[+59343 ms] executing: sysctl hw.optional.arm64
[ +3 ms] Exit code 0 from: sysctl hw.optional.arm64
[ ] hw.optional.arm64: 1
[ ] executing: /usr/bin/arch -arm64e xcrun cc -arch arm64 -miphoneos-version-min=12.0 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -c /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.o
[+105579 ms] executing: /usr/bin/arch -arm64e xcrun clang -arch arm64 -miphoneos-version-min=12.0 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -dynamiclib -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -fapplication-extension -install_name @rpath/App.framework/App -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.o
[ +100 ms] executing: /usr/bin/arch -arm64e xcrun dsymutil -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework.dSYM /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App
[ +751 ms] executing: /usr/bin/arch -arm64e xcrun strip -x /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App -o /Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/App.framework/App
[ +124 ms] aot_assembly_release: Complete
[ ] release_ios_bundle_flutter_assets: Starting due to {}
@mraleph I will try your suggestions above. So far I have tried to move most of the serializers into their own path dependency in the past, in the hopes it would cache the builds since they change less often than the main app.
I found that much time was spent on serializers generated from ferry_generator. I managed to reduce some of the generated code with some config and shaved 500s from release build time, an improvement of about 40%.
The shell script above has a missing quote somewhere. I gave up on it and wrote some Python to sort the files by time instead.
"""
/Users/michaelgolfi/development/flutter/bin/cache/artifacts/engine/ios-release/gen_snapshot_arm64 \
--print_precompiler_timings \
--trace-compiler \
--deterministic \
--snapshot_kind=app-aot-assembly \
--assembly=/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/arm64/snapshot_assembly.S \
/Users/michaelgolfi/work/mobile/.dart_tool/flutter_build/9dcd2357a6d80e978997d358d9163d28/app.dill 2>compilation_trace.txt
"""
def process_file(input_file, output_file):
data = []
with open(input_file, "r") as file:
for line in file:
if "-->" in line:
parts = line.strip().split("time:")
if len(parts) > 1:
time_part = (
parts[1].strip().split(" ")[0]
) # Extract the time before 'us'
name_part = parts[0].split("'")[1] # Extract the function name
name_part = name_part.replace(
" ", "_"
) # Replace spaces with underscores
data.append((name_part, int(time_part)))
# Sort data by the time, which is the second item in the tuple
data.sort(key=lambda x: x[1])
with open(output_file, "w") as file:
for name, time in data:
file.write(f"{name} {time}\n")
# Usage
process_file("compilation_trace.txt", "sorted.txt")
The ferry_generator
generated code does seem to hit some sort of non-linearity in TFA.
I extracted a benchmark core from another issue I was looking at before (related to build_runner
performance): https://github.com/mraleph/flutter_ferry_aot_stress_test. I can clearly see that as input grows TFA speed (input size divided by time taken) decreases: 0.49 kb/ms, then 0.25 kb/ms, then 0.11 kb/ms
@alexmarkov could you take this for a spin?
If helpful, I posted a repro on a build_runner issue a couple years ago.
This is an umbrella issue to track efforts around reduction of critical path when building large Flutter applications in AOT mode