flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
165.14k stars 27.23k forks source link

Xcode compilation is slow on macOS 14 bots #150642

Open gmackall opened 3 months ago

gmackall commented 3 months ago

There seems to be a pretty high frequency of timeouts right now on mac ci bots. Examples: From https://github.com/flutter/flutter/pull/150585 https://github.com/flutter/flutter/pull/150585/checks?check_run_id=26535171974 https://github.com/flutter/flutter/pull/150585/checks?check_run_id=26535172379

From https://github.com/flutter/flutter/pull/150614 https://github.com/flutter/flutter/pull/150614/checks?check_run_id=26520636606 https://github.com/flutter/flutter/pull/150614/checks?check_run_id=26520632867

From https://github.com/flutter/flutter/pull/150517 https://github.com/flutter/flutter/pull/150517/checks?check_run_id=26528249189 https://github.com/flutter/flutter/pull/150517/checks?check_run_id=26528248334

jmagman commented 3 months ago

Timeout seems to only be happening in the try pool https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20build_ios_framework_module_test?limit=50 vs https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20build_ios_framework_module_test?limit=50

christopherfujino commented 3 months ago

Timeout seems to only be happening in the try pool https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20build_ios_framework_module_test?limit=50 vs https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20build_ios_framework_module_test?limit=50

Saw this one in prod/post-submit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20tool_integration_tests_1_4/17133/infra

jmagman commented 3 months ago

Possibly related to trying out the Xcode beta in https://github.com/flutter/flutter/pull/150405

The slow bots show the Xcode 16 beta xcode_16a5171c in the Xcode caches:

 drwxr-xr-x   6 chrome-bot  staff  192 Jun 18 10:32 xcode_15a240d
 drwxr-xr-x   6 chrome-bot  staff  192 Jun 18 09:43 xcode_16a5171c
 drwx------   5 chrome-bot  staff  160 Jun 18 10:36 xcode_runtime_dmg_ios-17-0_15a240d

https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8744492829230850609/+/u/Show_xcode_cache/stdout

And the passing ones don't:

drwxr-xr-x   6 chrome-bot  staff  192 Jun 14 10:02 xcode_15a240d

The test logs show they are correctly and exclusively using the 15a240d version of Xcode, so I'm not sure why having the other one installed would matter.

@christopherfujino any idea how to clear the Xcode caches to ensure only 15 is present? go/flutter-infra-xcode#cache-pollution doesn't have explicit instructions. I see go/flutter-luci-playbook#delete-corrupted-mac-caches.

jmagman commented 3 months ago

Saw this one in prod/post-submit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20tool_integration_tests_1_4/17133/infra

Hmm, well that doesn't bode well for my theory...

christopherfujino commented 3 months ago

Maybe we should split these out by builder/test target, as I think Jenn and I are investigating different timeouts

christopherfujino commented 3 months ago

go/flutter-luci-playbook#delete-corrupted-mac-caches makes it sound like you need to go in one by one and rm rm -rf via ssh on each bot :'(

christopherfujino commented 3 months ago

go/flutter-luci-playbook#delete-corrupted-mac-caches makes it sound like you need to go in one by one and rm rm -rf via ssh on each bot :'(

@bdero do you know if--with SALT--you can run a remote command on all the minions at once?

jmagman commented 3 months ago

The last 5 timeouts of https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20tool_integration_tests_1_4?limit=100 in the try pool had xcode_16a5171c in the cache.

I wish I knew why this is happening, ideally these installations wouldn't impact each other.

christopherfujino commented 3 months ago

The last 5 timeouts of https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20tool_integration_tests_1_4?limit=100 in the try pool had xcode_16a5171c in the cache.

I wish I knew why this is happening, ideally these installations wouldn't impact each other.

There are two hard problems in computer science: naming things, and caching CI artifacts.

jmagman commented 3 months ago

@christopherfujino the xcode_16a5171c bots be removed from the try pool in the meantime?

jmagman commented 3 months ago

@christopherfujino there's code in the recipe to remove the cache: https://cs.opensource.google/flutter/recipes/+/master:recipe_modules/osx_sdk/api.py;l=244-247

Would it be easier to change the recipe to run that on test run timeout until all the bots have it cleared up? Or hard-code removal of xcode_16a5171c?

christopherfujino commented 3 months ago

I tried locally hard-coding the CacheMicroManager module to delete this cache, and scheduled a LED build on one of the affected bots: https://luci-milo.appspot.com/ui/p/flutter/builders/try.shadow/Mac%20tool_integration_tests_1_4/526/overview

christopherfujino commented 3 months ago

The previous build failed because I used rmtree, which apparently doesn't work on newer macos caches? In any case, updated it to use rm -rf, https://ci.chromium.org/ui/p/flutter/builders/try.shadow/Mac%20tool_integration_tests_1_4/528/infra

christopherfujino commented 3 months ago

The previous build failed because I used rmtree, which apparently doesn't work on newer macos caches? In any case, updated it to use rm -rf, https://ci.chromium.org/ui/p/flutter/builders/try.shadow/Mac%20tool_integration_tests_1_4/528/infra

Looks like it worked, I will put out a recipes CL for this.

jmagman commented 3 months ago

Looks like it worked, I will put out a recipes CL for this.

Documenting here: the recipe CL led run successfully deleted the cache, but the test still timed out. So it must not be related to the Xcode cache. Possibly something new and horrible only seen on the macOS 14.5 bots?

jmagman commented 2 months ago

@christopherfujino closed his recipe CL that deleted the Xcode beta caches, because the builds were timing out with and without that in the cache.

Comparing succeeding 20 minute run A to timing out run B

Infra

Passing and failing are running macOS 14.5. Most of the failing builds look like they are mostly happening on Intel, and the passing ones are happening on ARM. However, here's ARM build that's timing out as well (C) https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20plugin_test_ios/32197/infra

A macOS 14.5, arm M1, 2020 Macmini9,1 B macOS 14.5, Intel, 2018 Macmini8,1

An example of us debugging a similar issue, but in that case arm was much slower than Intel: https://github.com/flutter/flutter/issues/119750


curl is taking longer

A (1,462ms)

 [2024-07-15 02:25:09.923715] [STDOUT] stdout: [   +2 ms] Downloading ios tools...
 [2024-07-15 02:25:10.062979] [STDOUT] stdout: [ +139 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/54f0879a6d88ad06db73bf1b8c324b0d964d0314/ios/artifacts.zip md5 hash: 3udN9eSYL3k+LD+KebNDrA==
 [2024-07-15 02:25:11.390285] [STDOUT] stdout: [+1327 ms] Downloading ios tools... (completed in 1,462ms)

B (4.7s)

 [2024-07-15 00:57:57.352483] [STDOUT] stdout: [   +9 ms] Downloading ios tools...
 [2024-07-15 00:57:57.638480] [STDOUT] stdout: [ +286 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/39cd71f524f0a5245cf5c65cdf67cdb4b0aa5f7d/ios/artifacts.zip md5 hash: /OkvALMwpRKP8wu9rxEC9w==
 [2024-07-15 00:58:02.019085] [STDOUT] stdout: [+4380 ms] Downloading ios tools... (completed in 4.7s)

A (1,203ms)

 [2024-07-15 02:25:14.247250] [STDOUT] stdout: [  +27 ms] Downloading ios-profile tools...
 [2024-07-15 02:25:14.348025] [STDOUT] stdout: [ +100 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/54f0879a6d88ad06db73bf1b8c324b0d964d0314/ios-profile/artifacts.zip md5 hash: 6nerqgP+U5TVeECi8vT2JQ==
 [2024-07-15 02:25:15.450731] [STDOUT] stdout: [+1102 ms] Downloading ios-profile tools... (completed in 1,203ms)

B (3.5s)

 [2024-07-15 00:58:07.203080] [STDOUT] stdout: [  +95 ms] Downloading ios-profile tools...
 [2024-07-15 00:58:07.316529] [STDOUT] stdout: [ +113 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/39cd71f524f0a5245cf5c65cdf67cdb4b0aa5f7d/ios-profile/artifacts.zip md5 hash: v5SIAgpGiCX6vGEgkEVYwg==
 [2024-07-15 00:58:10.742525] [STDOUT] stdout: [+3426 ms] Downloading ios-profile tools... (completed in 3.5s)

xcrun xcodebuild building the project is consistently slower

A (9361 ms)

 [2024-07-15 02:25:21.261761] [STDOUT] stdout: [        ] executing: [/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.YHXZbg/plugintest/example/ios/Runner.xcodeproj/] /usr/bin/arch -arm64e xcrun xcodebuild -project /Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.YHXZbg/plugintest/example/ios/Runner.xcodeproj -destination generic/platform=iOS -showBuildSettings BUILD_DIR=/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.YHXZbg/plugintest/example/build/ios PROVISIONING_PROFILE_SPECIFIER=match Development * CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM=S8QB4VV633
 [2024-07-15 02:25:30.624459] [STDOUT] stdout: [+9361 ms] Command line invocation:

B (11265 ms)

 [2024-07-15 00:58:21.256956] [STDOUT] stdout: [        ] executing: [/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.SudFFU/plugintest/example/ios/Runner.xcodeproj/] xcrun xcodebuild -project /Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.SudFFU/plugintest/example/ios/Runner.xcodeproj -destination generic/platform=iOS -showBuildSettings BUILD_DIR=/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.SudFFU/plugintest/example/build/ios PROVISIONING_PROFILE_SPECIFIER=match Development * CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM=S8QB4VV633
 [2024-07-15 00:58:32.525233] [STDOUT] stdout: [+11265 ms] 

A (9077 ms)

 [2024-07-15 02:25:39.718249] [STDOUT] stdout: [+9077 ms] Command line invocation:
 [2024-07-15 02:25:39.718276] [STDOUT] stdout:                /Volumes/Work/s/w/ir/cache/osx_sdk/xcode_15a240d/XCode.app/Contents/Developer/usr/bin/xcodebuild -list

B (11151 ms)

 [2024-07-15 00:58:43.729228] [STDOUT] stdout: [+11151 ms] Command line invocation:
 [2024-07-15 00:58:43.729283] [STDOUT] stdout:                 /Volumes/Work/s/w/ir/cache/osx_sdk/xcode_15a240d/XCode.app/Contents/Developer/usr/bin/xcodebuild -list

pod install is slower

pod install is downloading and writing to disk, but not compiling anything with Xcode

A (734ms)

 [2024-07-15 02:25:49.300788] [STDOUT] stdout: [   +1 ms] Running pod install...
 [2024-07-15 02:25:50.035156] [STDOUT] stdout: [ +734 ms] Running pod install... (completed in 734ms)

B (2,251ms)

 [2024-07-15 00:58:56.527252] [STDOUT] stdout: [   +3 ms] Running pod install...
 [2024-07-15 00:58:58.779464] [STDOUT] stdout: [+2251 ms] Running pod install... (completed in 2,251ms)

top

This seems like the whole machine is crawling and isn't specific to Xcode compilation.

A top

 Processes: 488 total, 2 running, 486 sleeping, 1922 threads 
 2024/07/15 02:45:28
 Load Avg: 6.27, 15.77, 13.16 
 CPU usage: 2.27% user, 11.36% sys, 86.36% idle 
 SharedLibs: 505M resident, 75M data, 188M linkedit.
 MemRegions: 38982 total, 1244M resident, 219M private, 523M shared.
 PhysMem: 9686M used (1573M wired, 477M compressor), 5975M unused.
 VM: 186T vsize, 4921M framework vsize, 0(0) swapins, 0(0) swapouts.
 Networks: packets: 118903256/157G in, 26494/1467M out.
 Disks: 13015232/137G read, 6526599/361G written.

B top

 Processes: 464 total, 2 running, 462 sleeping, 1725 threads 
 2024/07/15 01:27:52
 Load Avg: 22.81, 13.89, 13.48 
 CPU usage: 2.43% user, 10.45% sys, 87.10% idle 
 SharedLibs: 678M resident, 103M data, 63M linkedit.
 MemRegions: 42664 total, 1902M resident, 338M private, 669M shared.
 PhysMem: 14G used (2863M wired, 0B compressor), 18G unused.
 VM: 15T vsize, 4528M framework vsize, 0(0) swapins, 0(0) swapouts.
 Networks: packets: 20013151/25G in, 263604/1352M out.
 Disks: 6387003/86G read, 1396493/143G written.

Spotlight and mds and fseventsd are running on both passing and failing machines (these shouldn't be running? https://g-issues.chromium.org/issues/40255200 and https://g-issues.chromium.org/issues/40790126)

jmagman commented 2 months ago

Given we aren't seeing timeouts on Android runs, and the last time we saw something similar, we resolved it by updating the version of Xcode https://github.com/flutter/flutter/issues/119750.

I'm tempted to do the same and update Xcode to see if there's something funky running Xcode 15.0 on macOS 14.5. I can try Xcode 15.2 (15C500b) which should run on both of our macOS 14.5 and macOS 13.6 machines.

jmagman commented 2 months ago

I'm at a loss for why this would be happening so much more in presubmit than postsubmit...

christopherfujino commented 2 months ago

I'm at a loss for why this would be happening so much more in presubmit than postsubmit...

are we timing out at the same limit pre and post?

zanderso commented 2 months ago

are we timing out at the same limit pre and post?

Just glanced quickly, but it looks like in post submit, plugin_test_ios is passing in around 10 minutes, but timing out after 30 minutes in presubmit. So the timeout config if there is a difference is probably not related.

christopherfujino commented 2 months ago

are we timing out at the same limit pre and post?

Just glanced quickly, but it looks like in post submit, plugin_test_ios is passing in around 10 minutes, but timing out after 30 minutes in presubmit. So the timeout config if there is a difference is probably not related.

ahh, ok, weird.

jmagman commented 2 months ago

I can try Xcode 15.2 (15C500b) which should run on both of our macOS 14.5 and macOS 13.6 machines.

I tried this on the packages repo and it didn't work, still hit timeouts https://github.com/flutter/packages/pull/7132#issuecomment-2229115457

jmagman commented 2 months ago

@stuartmorgan points out this isn't just happening on macOS 14, here's an ARM macOS 13.6.4 machine crawling and then timing out on a packages test: https://ci.chromium.org/ui/p/flutter/builders/try/Mac_arm64%20ios_platform_tests_shard_3%20master/14458/infra https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8742357634243779457/+/u/Run_package_tests/drive_examples/stdout