dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.22k stars 1.57k forks source link

[2.18.6] - arm64 low performance #51107

Closed jwinarske closed 1 year ago

jwinarske commented 1 year ago

(Tried the gitter channel; nothing but crickets)

I'm seeing very low performance when running dart on Raspberry PI 4 (arm64). dart run / test takes a really long time. I'm wondering if this has to do with how I'm building it.

My Yocto recipe: https://github.com/meta-flutter/meta-flutter/blob/kirkstone/recipes-devtools/dart/dart-sdk_2.18.6.bb Boils down to:

export DART_USE_SYSROOT="${TARGET_SYSROOT}"
export DART_USE_TOOLCHAIN="${STAGING_DIR_NATIVE}/usr/bin"

python3 ./tools/gn.py --platform-sdk --verify-sdk-hash --use-mallinfo2 --no-goma --clang --mode product --arch arm64
BUILD_DIR="${OUT_DIR}/$(ls ${OUT_DIR})"
autoninja -C "${BUILD_DIR}" create_sdk

External toolchain is clang 14.

Am I installing tools that are running some kind of emulation?

julemand101 commented 1 year ago

Can you reproduce the issue when using the official builds for ARM64? You can download them from: https://dart.dev/get-dart/archive

jwinarske commented 1 year ago

Similar. It's not my build. My build the same sequence as below runs a few seconds faster, I imagine due to compiler tunings; not a generic arm64 build.

Using pre-builts:

raspberrypi4-64:~/hello$ dart --version
Dart SDK version: 2.18.6 (stable) (Tue Dec 13 21:15:14 2022 +0000) on "linux_arm64"
raspberrypi4-64:~$ time dart create hello
Creating hello using template console...

  .gitignore
  analysis_options.yaml
  CHANGELOG.md
  pubspec.yaml
  README.md
  bin/hello.dart
  lib/hello.dart
  test/hello_test.dart

Running pub get...                     29.1s
  Resolving dependencies...
  Downloading lints 2.0.1...
  Downloading test 1.22.2...
  Downloading test_core 0.4.22...
  Downloading test_api 0.4.18...
  Downloading typed_data 1.3.1...
  Downloading stream_channel 2.1.1...
  Downloading stack_trace 1.11.0...
  Downloading shelf_packages_handler 3.0.1...
  Downloading pool 1.5.1...
  Downloading node_preamble 2.0.1...
  Downloading boolean_selector 2.1.1...
  Downloading source_maps 0.10.11...
  Downloading source_map_stack_trace 2.1.1...
  Downloading term_glyph 1.2.1...
  Downloading yaml 3.1.1...
  Downloading shelf_static 1.1.1...
  Downloading package_config 2.1.0...
  Downloading js 0.6.5...
  Downloading meta 1.8.0...
  Downloading collection 1.17.0...
  Downloading string_scanner 1.2.0...
  Downloading http_parser 4.0.2...
  Downloading webkit_inspection_protocol 1.2.0...
  Downloading web_socket_channel 2.3.0...
  Downloading crypto 3.0.2...
  Downloading shelf_web_socket 1.0.3...
  Downloading path 1.8.3...
  Downloading io 1.0.4...
  Downloading http_multi_server 3.2.1...
  Downloading matcher 0.12.14...
  Downloading logging 1.1.0...
  Downloading source_span 1.9.1...
  Downloading glob 2.1.1...
  Downloading file 6.1.4...
  Downloading mime 1.0.4...
  Downloading convert 3.1.1...
  Downloading frontend_server_client 3.2.0...
  Downloading args 2.3.2...
  Downloading shelf 1.4.0...
  Downloading async 2.10.0...
  Downloading coverage 1.6.2...
  Downloading vm_service 10.1.0...
  Downloading analyzer 5.4.0...
  Downloading _fe_analyzer_shared 52.0.0...
  Downloading watcher 1.0.2...
  Downloading pub_semver 2.1.3...
  Changed 46 dependencies!

Created project hello in hello! In order to get started, run the following commands:

  cd hello
  dart run

real    0m31.650s
user    0m31.687s
sys 0m8.424s
raspberrypi4-64:~$ cd hello
raspberrypi4-64:~/hello$ time dart run
Building package executable... (8.1s)
Built hello:hello.
Hello world: 42!

real    0m12.264s
user    0m14.666s
sys 0m2.119s
raspberrypi4-64:~/hello$ time dart test
Building package executable... (1:50.5s)
Built test:test.
00:36 +1: All tests passed!                                                                                                                                                            

real    2m34.357s
user    3m36.843s
sys 0m20.225s
raspberrypi4-64:~/hello$
a-siva commented 1 year ago

@jwinarske could you give us some more details

jwinarske commented 1 year ago

@a-siva

real 0m9.046s user 0m9.308s sys 0m1.395s raspberrypi4-64:~/hello$ time dart bin/hello.dill Hello world: 42!

real 0m0.291s user 0m0.242s sys 0m0.065s

I'm not clear on all the steps yet, but looks like the build executable step takes a while.  One example is running the dart grpc [quickstart](https://grpc.io/docs/languages/dart/quickstart/)

`Building package executables...` takes the most time:

raspberrypi4-64:~/hello$ time dart pub global activate protoc_plugin

Activated protoc_plugin 20.0.1.

real 2m22.212s user 6m8.101s sys 0m32.912s raspberrypi4-64:~/hello$

Client

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart compile kernel bin/client.dart --verbose Compiling bin/client.dart to kernel file bin/client.dill. Info: Compiling with sound null safety

real 0m53.980s user 1m15.864s sys 0m7.802s raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart bin/client.dill Greeter client received: Hello, world!

real 0m1.835s user 0m1.581s sys 0m0.284s



The primary use cases for this is implementing test cases for cross domain components used in Flutter based systems.

The network latency is expected.  The building package executable step seems like the primary offender when running tests.

A scheme where the dart test cases are pre-built on build server, then run dill on target could be an acceptable approach; if not too complex.  So do all the heavy lifting off target.

Looking for any creative solutions that make sense and scale over time.
mraleph commented 1 year ago

I think we have seen this before - though I can't easily find the issue to duplicate against. IIRC we don't ship trained app-jit snapshots for kernel-service with ARM sdks so running anything that involves compilation to Kernel is rather slow. We don't include these snapshots because building them would be too slow (ARM sdks are cross-compiled meaning that producing app-jit snapshot would require training on simulated build - and that's slow).

I think we should restart our efforts to move away from JIT compiled kernel service to AOT compiled kernel service.

jwinarske commented 1 year ago

@mraleph How do I generate a trained app-it snapshot for kernel-service with Aarch64/ARM? I'm fine with a slow build if the trade off is a faster runtime experience.

mraleph commented 1 year ago

@jwinarske You can try to edit this line and set it to true and see what happens.

jwinarske commented 1 year ago

@mraleph

Setting this to true I see kernel-service.dart.snapshot in dart-sdk/bin/snapshots.

without kernel-service.dart.snapshot (default product build arm64 + compiler tuning)

$ time dart run
Building package executable... (8.1s)
real    0m12.264s
user    0m14.666s
sys 0m2.119s

raspberrypi4-64:~/hello$ time dart test
Building package executable... (1:50.5s)
real    2m34.357s
user    3m36.843s
sys 0m20.225s

raspberrypi4-64:~/hello$ time dart pub global activate protoc_plugin
Building package executables... (2:06.9s)
real    2m22.212s
user    6m8.101s
sys 0m32.912s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart compile kernel bin/client.dart --verbose
real    0m53.980s
user    1m15.864s
sys 0m7.802s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart bin/client.dill
Greeter client received: Hello, world!

real    0m1.835s
user    0m1.581s
sys 0m0.284s

with kernel-service.dart.snapshot (compiler tuning)

raspberrypi4-64:~/hello$ time dart run
Building package executable... (6.2s)
real    0m11.160s
user    0m12.320s
sys 0m1.997s

raspberrypi4-64:~/hello$ time dart test
Building package executable... (1:46.3s)
real    2m27.948s
user    3m26.734s
sys 0m19.301s

$  time dart pub global activate protoc_plugin
Building package executables... (2:07.5s)
real    2m17.364s
user    6m4.776s
sys 0m30.495s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart run bin/client.dart
real    0m56.419s
user    1m18.232s
sys 0m7.831s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart compile exe bin/client.dart
real    1m31.714s
user    2m2.032s
sys 0m11.927s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time bin/client.exe
real    0m0.098s
user    0m0.026s
sys 0m0.054s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart compile aot-snapshot bin/client.dart
real    1m30.902s
user    1m59.780s
sys 0m10.797s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dartaotruntime bin/client.aot
real    0m0.094s
user    0m0.040s
sys 0m0.037s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart compile jit-snapshot bin/client.dart
real    0m59.273s
user    1m22.598s
sys 0m8.813s

raspberrypi4-64:~/grpc-dart-master/example/helloworld$ time dart compile kernel bin/client.dart
real    0m55.547s
user    1m16.219s
sys 0m7.681s

No noticeable difference in build time on a 48 core machine with autoninja.

My takeaways:

  1. When running dart-sdk (toolchain) on target build SDK with create_kernel_service_snapshot for optimal perf.
  2. Use dartaotruntime on target, pre-compile AOT on host.
  3. exe build time exceeds AOT build time, between build and run exe is slower than AOT.
a-siva commented 1 year ago

@jwinarske I am wondering if file IO on the raspberry pi is slow, I presume you are using the SD card that came with the raspberry pi, can you do some measurements of pure file read operations. One of our team members here has had some experience with slow SD cards and apparently got a 3x speed improvement in read operations when they replaced it with a high speed SD card.

jwinarske commented 1 year ago

@a-siva I use high speed cards on all my systems. Otherwise it takes too long to image them. Some run UFS storage, some eMMC, and a few SD cards. I ran the same tests on /tmp (RAM) and the numbers didn't change much. Now I just need to sort out the cross compile process.

jwinarske commented 1 year ago

Since I'm able to get good performance with AOT on target, and am able to cross compile - closing the ticket. Thanks!