Open gmackall opened 3 months ago
Timeout seems to only be happening in the try pool https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20build_ios_framework_module_test?limit=50 vs https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20build_ios_framework_module_test?limit=50
Timeout seems to only be happening in the try pool https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20build_ios_framework_module_test?limit=50 vs https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20build_ios_framework_module_test?limit=50
Saw this one in prod/post-submit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20tool_integration_tests_1_4/17133/infra
Possibly related to trying out the Xcode beta in https://github.com/flutter/flutter/pull/150405
The slow bots show the Xcode 16 beta xcode_16a5171c in the Xcode caches:
drwxr-xr-x 6 chrome-bot staff 192 Jun 18 10:32 xcode_15a240d
drwxr-xr-x 6 chrome-bot staff 192 Jun 18 09:43 xcode_16a5171c
drwx------ 5 chrome-bot staff 160 Jun 18 10:36 xcode_runtime_dmg_ios-17-0_15a240d
And the passing ones don't:
drwxr-xr-x 6 chrome-bot staff 192 Jun 14 10:02 xcode_15a240d
The test logs show they are correctly and exclusively using the 15a240d version of Xcode, so I'm not sure why having the other one installed would matter.
@christopherfujino any idea how to clear the Xcode caches to ensure only 15 is present? go/flutter-infra-xcode#cache-pollution doesn't have explicit instructions. I see go/flutter-luci-playbook#delete-corrupted-mac-caches.
Saw this one in prod/post-submit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20tool_integration_tests_1_4/17133/infra
Hmm, well that doesn't bode well for my theory...
Maybe we should split these out by builder/test target, as I think Jenn and I are investigating different timeouts
go/flutter-luci-playbook#delete-corrupted-mac-caches makes it sound like you need to go in one by one and rm rm -rf via ssh on each bot :'(
go/flutter-luci-playbook#delete-corrupted-mac-caches makes it sound like you need to go in one by one and rm rm -rf via ssh on each bot :'(
@bdero do you know if--with SALT--you can run a remote command on all the minions at once?
The last 5 timeouts of https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20tool_integration_tests_1_4?limit=100 in the try pool had xcode_16a5171c in the cache.
I wish I knew why this is happening, ideally these installations wouldn't impact each other.
The last 5 timeouts of https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20tool_integration_tests_1_4?limit=100 in the try pool had xcode_16a5171c in the cache.
I wish I knew why this is happening, ideally these installations wouldn't impact each other.
There are two hard problems in computer science: naming things, and caching CI artifacts.
@christopherfujino the xcode_16a5171c bots be removed from the try pool in the meantime?
@christopherfujino there's code in the recipe to remove the cache: https://cs.opensource.google/flutter/recipes/+/master:recipe_modules/osx_sdk/api.py;l=244-247
Would it be easier to change the recipe to run that on test run timeout until all the bots have it cleared up? Or hard-code removal of xcode_16a5171c?
I tried locally hard-coding the CacheMicroManager module to delete this cache, and scheduled a LED build on one of the affected bots: https://luci-milo.appspot.com/ui/p/flutter/builders/try.shadow/Mac%20tool_integration_tests_1_4/526/overview
The previous build failed because I used rmtree, which apparently doesn't work on newer macos caches? In any case, updated it to use rm -rf, https://ci.chromium.org/ui/p/flutter/builders/try.shadow/Mac%20tool_integration_tests_1_4/528/infra
The previous build failed because I used rmtree, which apparently doesn't work on newer macos caches? In any case, updated it to use rm -rf, https://ci.chromium.org/ui/p/flutter/builders/try.shadow/Mac%20tool_integration_tests_1_4/528/infra
Looks like it worked, I will put out a recipes CL for this.
Looks like it worked, I will put out a recipes CL for this.
Documenting here: the recipe CL led run successfully deleted the cache, but the test still timed out. So it must not be related to the Xcode cache. Possibly something new and horrible only seen on the macOS 14.5 bots?
@christopherfujino closed his recipe CL that deleted the Xcode beta caches, because the builds were timing out with and without that in the cache.
Comparing succeeding 20 minute run A to timing out run B
Passing and failing are running macOS 14.5. Most of the failing builds look like they are mostly happening on Intel, and the passing ones are happening on ARM. However, here's ARM build that's timing out as well (C) https://ci.chromium.org/ui/p/flutter/builders/try/Mac%20plugin_test_ios/32197/infra
A macOS 14.5, arm M1, 2020 Macmini9,1 B macOS 14.5, Intel, 2018 Macmini8,1
An example of us debugging a similar issue, but in that case arm was much slower than Intel: https://github.com/flutter/flutter/issues/119750
A (1,462ms)
[2024-07-15 02:25:09.923715] [STDOUT] stdout: [ +2 ms] Downloading ios tools...
[2024-07-15 02:25:10.062979] [STDOUT] stdout: [ +139 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/54f0879a6d88ad06db73bf1b8c324b0d964d0314/ios/artifacts.zip md5 hash: 3udN9eSYL3k+LD+KebNDrA==
[2024-07-15 02:25:11.390285] [STDOUT] stdout: [+1327 ms] Downloading ios tools... (completed in 1,462ms)
B (4.7s)
[2024-07-15 00:57:57.352483] [STDOUT] stdout: [ +9 ms] Downloading ios tools...
[2024-07-15 00:57:57.638480] [STDOUT] stdout: [ +286 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/39cd71f524f0a5245cf5c65cdf67cdb4b0aa5f7d/ios/artifacts.zip md5 hash: /OkvALMwpRKP8wu9rxEC9w==
[2024-07-15 00:58:02.019085] [STDOUT] stdout: [+4380 ms] Downloading ios tools... (completed in 4.7s)
A (1,203ms)
[2024-07-15 02:25:14.247250] [STDOUT] stdout: [ +27 ms] Downloading ios-profile tools...
[2024-07-15 02:25:14.348025] [STDOUT] stdout: [ +100 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/54f0879a6d88ad06db73bf1b8c324b0d964d0314/ios-profile/artifacts.zip md5 hash: 6nerqgP+U5TVeECi8vT2JQ==
[2024-07-15 02:25:15.450731] [STDOUT] stdout: [+1102 ms] Downloading ios-profile tools... (completed in 1,203ms)
B (3.5s)
[2024-07-15 00:58:07.203080] [STDOUT] stdout: [ +95 ms] Downloading ios-profile tools...
[2024-07-15 00:58:07.316529] [STDOUT] stdout: [ +113 ms] Content https://storage.googleapis.com/flutter_infra_release/flutter/39cd71f524f0a5245cf5c65cdf67cdb4b0aa5f7d/ios-profile/artifacts.zip md5 hash: v5SIAgpGiCX6vGEgkEVYwg==
[2024-07-15 00:58:10.742525] [STDOUT] stdout: [+3426 ms] Downloading ios-profile tools... (completed in 3.5s)
xcrun xcodebuild
building the project is consistently slowerA (9361 ms)
[2024-07-15 02:25:21.261761] [STDOUT] stdout: [ ] executing: [/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.YHXZbg/plugintest/example/ios/Runner.xcodeproj/] /usr/bin/arch -arm64e xcrun xcodebuild -project /Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.YHXZbg/plugintest/example/ios/Runner.xcodeproj -destination generic/platform=iOS -showBuildSettings BUILD_DIR=/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.YHXZbg/plugintest/example/build/ios PROVISIONING_PROFILE_SPECIFIER=match Development * CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM=S8QB4VV633
[2024-07-15 02:25:30.624459] [STDOUT] stdout: [+9361 ms] Command line invocation:
B (11265 ms)
[2024-07-15 00:58:21.256956] [STDOUT] stdout: [ ] executing: [/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.SudFFU/plugintest/example/ios/Runner.xcodeproj/] xcrun xcodebuild -project /Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.SudFFU/plugintest/example/ios/Runner.xcodeproj -destination generic/platform=iOS -showBuildSettings BUILD_DIR=/Volumes/Work/s/w/ir/x/t/flutter_devicelab_plugin_test.SudFFU/plugintest/example/build/ios PROVISIONING_PROFILE_SPECIFIER=match Development * CODE_SIGN_STYLE=Manual DEVELOPMENT_TEAM=S8QB4VV633
[2024-07-15 00:58:32.525233] [STDOUT] stdout: [+11265 ms]
A (9077 ms)
[2024-07-15 02:25:39.718249] [STDOUT] stdout: [+9077 ms] Command line invocation:
[2024-07-15 02:25:39.718276] [STDOUT] stdout: /Volumes/Work/s/w/ir/cache/osx_sdk/xcode_15a240d/XCode.app/Contents/Developer/usr/bin/xcodebuild -list
B (11151 ms)
[2024-07-15 00:58:43.729228] [STDOUT] stdout: [+11151 ms] Command line invocation:
[2024-07-15 00:58:43.729283] [STDOUT] stdout: /Volumes/Work/s/w/ir/cache/osx_sdk/xcode_15a240d/XCode.app/Contents/Developer/usr/bin/xcodebuild -list
pod install
is slowerpod install
is downloading and writing to disk, but not compiling anything with Xcode
A (734ms)
[2024-07-15 02:25:49.300788] [STDOUT] stdout: [ +1 ms] Running pod install...
[2024-07-15 02:25:50.035156] [STDOUT] stdout: [ +734 ms] Running pod install... (completed in 734ms)
B (2,251ms)
[2024-07-15 00:58:56.527252] [STDOUT] stdout: [ +3 ms] Running pod install...
[2024-07-15 00:58:58.779464] [STDOUT] stdout: [+2251 ms] Running pod install... (completed in 2,251ms)
top
This seems like the whole machine is crawling and isn't specific to Xcode compilation.
Processes: 488 total, 2 running, 486 sleeping, 1922 threads
2024/07/15 02:45:28
Load Avg: 6.27, 15.77, 13.16
CPU usage: 2.27% user, 11.36% sys, 86.36% idle
SharedLibs: 505M resident, 75M data, 188M linkedit.
MemRegions: 38982 total, 1244M resident, 219M private, 523M shared.
PhysMem: 9686M used (1573M wired, 477M compressor), 5975M unused.
VM: 186T vsize, 4921M framework vsize, 0(0) swapins, 0(0) swapouts.
Networks: packets: 118903256/157G in, 26494/1467M out.
Disks: 13015232/137G read, 6526599/361G written.
Processes: 464 total, 2 running, 462 sleeping, 1725 threads
2024/07/15 01:27:52
Load Avg: 22.81, 13.89, 13.48
CPU usage: 2.43% user, 10.45% sys, 87.10% idle
SharedLibs: 678M resident, 103M data, 63M linkedit.
MemRegions: 42664 total, 1902M resident, 338M private, 669M shared.
PhysMem: 14G used (2863M wired, 0B compressor), 18G unused.
VM: 15T vsize, 4528M framework vsize, 0(0) swapins, 0(0) swapouts.
Networks: packets: 20013151/25G in, 263604/1352M out.
Disks: 6387003/86G read, 1396493/143G written.
Spotlight and mds and fseventsd are running on both passing and failing machines (these shouldn't be running? https://g-issues.chromium.org/issues/40255200 and https://g-issues.chromium.org/issues/40790126)
Given we aren't seeing timeouts on Android runs, and the last time we saw something similar, we resolved it by updating the version of Xcode https://github.com/flutter/flutter/issues/119750.
I'm tempted to do the same and update Xcode to see if there's something funky running Xcode 15.0 on macOS 14.5. I can try Xcode 15.2 (15C500b) which should run on both of our macOS 14.5 and macOS 13.6 machines.
I'm at a loss for why this would be happening so much more in presubmit than postsubmit...
I'm at a loss for why this would be happening so much more in presubmit than postsubmit...
are we timing out at the same limit pre and post?
are we timing out at the same limit pre and post?
Just glanced quickly, but it looks like in post submit, plugin_test_ios is passing in around 10 minutes, but timing out after 30 minutes in presubmit. So the timeout config if there is a difference is probably not related.
are we timing out at the same limit pre and post?
Just glanced quickly, but it looks like in post submit, plugin_test_ios is passing in around 10 minutes, but timing out after 30 minutes in presubmit. So the timeout config if there is a difference is probably not related.
ahh, ok, weird.
I can try Xcode 15.2 (15C500b) which should run on both of our macOS 14.5 and macOS 13.6 machines.
I tried this on the packages repo and it didn't work, still hit timeouts https://github.com/flutter/packages/pull/7132#issuecomment-2229115457
@stuartmorgan points out this isn't just happening on macOS 14, here's an ARM macOS 13.6.4 machine crawling and then timing out on a packages test: https://ci.chromium.org/ui/p/flutter/builders/try/Mac_arm64%20ios_platform_tests_shard_3%20master/14458/infra https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8742357634243779457/+/u/Run_package_tests/drive_examples/stdout
There seems to be a pretty high frequency of timeouts right now on mac ci bots. Examples: From https://github.com/flutter/flutter/pull/150585 https://github.com/flutter/flutter/pull/150585/checks?check_run_id=26535171974 https://github.com/flutter/flutter/pull/150585/checks?check_run_id=26535172379
From https://github.com/flutter/flutter/pull/150614 https://github.com/flutter/flutter/pull/150614/checks?check_run_id=26520636606 https://github.com/flutter/flutter/pull/150614/checks?check_run_id=26520632867
From https://github.com/flutter/flutter/pull/150517 https://github.com/flutter/flutter/pull/150517/checks?check_run_id=26528249189 https://github.com/flutter/flutter/pull/150517/checks?check_run_id=26528248334