flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
166.35k stars 27.54k forks source link

Mac_ios microbenchmarks_ios is 7.00% flaky. XCode build concurrent build failure. #126412

Closed fluttergithubbot closed 1 year ago

fluttergithubbot commented 1 year ago

The post-submit test builder Mac_ios microbenchmarks_ios had a flaky ratio 7.00% for the past (up to) 100 commits, which is above our 2.00% threshold.

One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5886 Commit: https://github.com/flutter/flutter/commit/43fa2c01c331177f2b67259a08178497e0343d67

Flaky builds: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5886 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5860 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5848 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5841 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5838 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5806 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5801

Recent test runs: https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20microbenchmarks_ios

Please follow https://github.com/flutter/flutter/wiki/Reducing-Test-Flakiness#fixing-flaky-tests to fix the flakiness and enable the test back after validating the fix (internal dashboard to validate: go/flutter_test_flakiness).

cyanglaz commented 1 year ago

5886 is flutter run time out:

[2023-05-06 02:31:46.252076] [STDOUT] [STDOUT] [        ] Application launched on the device. Waiting for Dart VM Service url.
[2023-05-06 02:32:16.262792] [STDOUT] [STDOUT] [   +7 ms] (lldb) process interrupt
[2023-05-06 02:32:16.263283] [STDOUT] [STDOUT] [        ] error: Failed to halt process: Halt timed out. State = running
[2023-05-06 02:32:17.191989] [STDOUT] [STDOUT] [ +928 ms] (lldb) Process 534 exited with status = -1 (0xffffffff) lost connection

It looks similar to the observatory timeout in https://github.com/flutter/flutter/issues/121231 but with some additional logs: This particular failure has Failed to halt process: Halt timed out. State = running and Process 534 exited with status = -1 (0xffffffff) lost connection

cc @vashworth @christopherfujino

Update: this is tracked in https://github.com/flutter/flutter/issues/120808

cyanglaz commented 1 year ago

5860 crash

[2023-05-05 09:12:06.059964] [STDOUT] [STDOUT]                ===== CRASH =====
[2023-05-05 09:12:06.059988] [STDOUT] [STDOUT]                si_signo=Segmentation fault: 11(11), si_code=0, si_addr=0x0
[2023-05-05 09:12:06.060011] [STDOUT] [STDOUT]                version=3.1.0-80.0.dev (dev) (Fri May 5 00:27:27 2023 -0700) on "macos_x64"
[2023-05-05 09:12:06.060037] [STDOUT] [STDOUT]                pid=31810, thread=7171, isolate_group=kernel-service(0x7fa1b880e000), isolate=(nil)(0x0)
[2023-05-05 09:12:06.060060] [STDOUT] [STDOUT]                os=macos, arch=x64, comp=no, sim=no
[2023-05-05 09:12:06.060083] [STDOUT] [STDOUT]                isolate_instructions=108b0d000, vm_instructions=101585960
[2023-05-05 09:12:06.060108] [STDOUT] [STDOUT]                fp=7000013f2d60, sp=7000013f2d30, pc=1018a70d7
[2023-05-05 09:12:06.060131] [STDOUT] [STDOUT]                  pc 0x00000001018a70d7 fp 0x00007000013f2d60 dart::GCMarker::MarkObjects(dart::PageSpace*)+0x19d7
[2023-05-05 09:12:06.060156] [STDOUT] [STDOUT]                  pc 0x000000010180808d fp 0x00007000013f2d90 dart::UntaggedObject::VisitPointersPredefined(dart::ObjectPointerVisitor*, long)+0x52d
[2023-05-05 09:12:06.060182] [STDOUT] [STDOUT]                  pc 0x00000001018a5208 fp 0x00007000013f2dd0 dart::GCMarker::IncrementalMarkWithSizeBudget(dart::PageSpace*, long)+0x2e8
[2023-05-05 09:12:06.060206] [STDOUT] [STDOUT]                  pc 0x00000001018a65ab fp 0x00007000013f2e70 dart::GCMarker::MarkObjects(dart::PageSpace*)+0xeab
[2023-05-05 09:12:06.060231] [STDOUT] [STDOUT]                  pc 0x00000001018a6c53 fp 0x00007000013f2ea0 dart::GCMarker::MarkObjects(dart::PageSpace*)+0x1553
[2023-05-05 09:12:06.060254] [STDOUT] [STDOUT]                  pc 0x000000010187a8eb fp 0x00007000013f2f20 dart::ThreadPool::WorkerLoop(dart::ThreadPool::Worker*)+0x14b
[2023-05-05 09:12:06.060280] [STDOUT] [STDOUT]                  pc 0x000000010187ad49 fp 0x00007000013f2f50 dart::ThreadPool::Worker::Main(unsigned long)+0x79
[2023-05-05 09:12:06.060304] [STDOUT] [STDOUT]                  pc 0x00000001017f1a2e fp 0x00007000013f2fb0 dart::OSThread::GetMaxStackSize()+0xde
[2023-05-05 09:12:06.060329] [STDOUT] [STDOUT]                  pc 0x00007ff807f4c4e1 fp 0x00007000013f2fd0 _pthread_start+0x7d
[2023-05-05 09:12:06.060352] [STDOUT] [STDOUT]                  pc 0x00007ff807f47f6b fp 0x00007000013f2ff0 thread_start+0xf
[2023-05-05 09:12:06.060376] [STDOUT] [STDOUT]                -- End of DumpStackTrace
[2023-05-05 09:12:06.060399] [STDOUT] [STDOUT]                /opt/s/w/ir/x/w/recipe_cleanup/tmpfxas5lcm/flutter sdk/packages/flutter_tools/bin/xcode_backend.sh: line 29: 31810 Abort trap: 6           "$DART" "$BIN_DIR/xcode_backend.dart" "$@"
[2023-05-05 09:12:06.060425] [STDOUT] [STDOUT]                Command PhaseScriptExecution failed with a nonzero exit code
[2023-05-05 09:12:06.060447] [STDOUT] [STDOUT] 
[2023-05-05 09:12:06.060469] [STDOUT] [STDOUT] 
[2023-05-05 09:12:06.060493] [STDOUT] [STDOUT]                Result bundle written to path:
[2023-05-05 09:12:06.060516] [STDOUT] [STDOUT]                  /opt/s/w/ir/x/t/flutter_tools.HIAXTK/flutter_ios_build_temp_dirA4t5kp/temporary_xcresult_bundle
[2023-05-05 09:12:06.060541] [STDOUT] [STDOUT] 
[2023-05-05 09:12:06.060563] [STDOUT] [STDOUT] 
[2023-05-05 09:12:06.060595] [STDOUT] [STDOUT] [   +5 ms] "flutter run" took 18,101ms.
[2023-05-05 09:12:06.064982] [STDOUT] [STDOUT] [  +10 ms] ensureAnalyticsSent: 0ms
[2023-05-05 09:12:06.065037] [STDOUT] [STDOUT] [        ] Running 1 shutdown hook
[2023-05-05 09:12:06.065587] [STDOUT] [STDOUT] [        ] Shutdown hooks complete
[2023-05-05 09:12:06.065782] [STDOUT] [STDOUT] [        ] exiting with code 2
[2023-05-05 09:12:06.074775] [STDOUT] Taking screenshot of working device 00008030-00062432346B402E at /opt/s/w/ir/x/w/recipe_cleanup/flutter_logs_dir/device-screenshot-2023-05-05T09:12:06.072117.png
[2023-05-05 09:12:06.188979] [STDOUT] Checking for reboot
[2023-05-05 09:12:06.192401] [STDOUT] rebooting
[2023-05-05 09:12:12.975182] [STDOUT] Failed to take screenshot. Continuing.
[microbenchmarks_ios] Process terminated with exit code 0.
Task result:

tracked in https://github.com/flutter/flutter/issues/126164

also:

 Task runner system failed with exception!
ext.cocoonRunTask: (-32000) Server error
Executable "/opt/s/w/ir/x/w/recipe_cleanup/tmpfxas5lcm/flutter sdk/bin/flutter" failed with exit code -6.
#0      fail (package:flutter_devicelab/framework/utils.dart:102:3)
#1      _execute (package:flutter_devicelab/framework/utils.dart:371:5)
<asynchronous suspension>
#2      eval (package:flutter_devicelab/framework/utils.dart:430:3)
<asynchronous suspension>
#3      IosDeviceDiscovery.discoverDevices (package:flutter_devicelab/framework/devices.dart:911:47)
<asynchronous suspension>
#4      IosDeviceDiscovery.chooseWorkingDevice (package:flutter_devicelab/framework/devices.dart:882:41)
<asynchronous suspension>
#5      IosDeviceDiscovery.workingDevice (package:flutter_devicelab/framework/devices.dart:872:7)
<asynchronous suspension>
#6      _TaskRunner._getWorkingDeviceIfAvailable (package:flutter_devicelab/framework/framework.dart:96:14)
<asynchronous suspension>
#7      _TaskRunner.run (package:flutter_devicelab/framework/framework.dart:153:30)
<asynchronous suspension>
#8      new _TaskRunner.<anonymous closure> (package:flutter_devicelab/framework/framework.dart:77:33)
<asynchronous suspension>

#0      new _OutstandingRequest (package:vm_service/src/vm_service.dart:1865:45)
#1      VmService._call (package:vm_service/src/vm_service.dart:2417:21)
#2      VmService.callServiceExtension (package:vm_service/src/vm_service.dart:2388:14)
#3      runTask (package:flutter_devicelab/framework/runner.dart:231:73)
<asynchronous suspension>
#4      rerunTask (package:flutter_devicelab/framework/runner.dart:113:29)
<asynchronous suspension>
#5      runTasks (package:flutter_devicelab/framework/runner.dart:51:16)
<asynchronous suspension>
#6      TestCommand.run (package:flutter_devicelab/command/test.dart:72:5)
<asynchronous suspension>
#7      CommandRunner.runCommand (package:args/command_runner.dart:212:13)
<asynchronous suspension>

[microbenchmarks_ios] Terminating process...
cyanglaz commented 1 year ago

5848, 5841, 5838, 5801 are app crash on main thread:

(lldb) process interrupt
[2023-05-04 19:40:42.272362] [STDOUT] [STDOUT] [ +369 ms] (lldb) Process 521 stopped
[2023-05-04 19:40:42.272855] [STDOUT] [STDOUT] [        ] * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
[2023-05-04 19:40:42.272917] [STDOUT] [STDOUT] [        ]     frame #0: 0x00000001cc798aa8 libsystem_kernel.dylib`mach_msg2_trap + 8
[2023-05-04 19:40:42.273051] [STDOUT] [STDOUT] [        ] libsystem_kernel.dylib`mach_msg2_trap:
[2023-05-04 19:40:42.273190] [STDOUT] [STDOUT] [        ] ->  0x1cc798aa8 <+8>: ret    
[2023-05-04 19:40:42.273312] [STDOUT] [STDOUT] [        ] libsystem_kernel.dylib`macx_swapon:
[2023-05-04 19:40:42.273412] [STDOUT] [STDOUT] [        ]     0x1cc798aac <+0>: mov    x16, #-0x30
[2023-05-04 19:40:42.273506] [STDOUT] [STDOUT] [        ]     0x1cc798ab0 <+4>: svc    #0x80
[2023-05-04 19:40:42.273618] [STDOUT] [STDOUT] [        ]     0x1cc798ab4 <+8>: ret    
[2023-05-04 19:40:42.273680] [STDOUT] [STDOUT] [        ] Target 0: (Runner) stopped.
[2023-05-04 19:40:42.273855] [STDOUT] [STDOUT] [        ] thread backtrace all
[2023-05-04 19:40:42.274117] [STDOUT] [STDOUT] [        ] process continue
[2023-05-04 19:40:42.329324] [STDOUT] [STDOUT] [  +55 ms] * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
[2023-05-04 19:40:42.329398] [STDOUT] [STDOUT] [        ]   * frame #0: 0x00000001cc798aa8 libsystem_kernel.dylib`mach_msg2_trap + 8
[2023-05-04 19:40:42.329452] [STDOUT] [STDOUT] [        ]     frame #1: 0x00000001cc7aafc4 libsystem_kernel.dylib`mach_msg2_internal + 80
[2023-05-04 19:40:42.329478] [STDOUT] [STDOUT] [        ]     frame #2: 0x00000001cc7ab204 libsystem_kernel.dylib`mach_msg_overwrite + 388
[2023-05-04 19:40:42.329686] [STDOUT] [STDOUT] [        ]     frame #3: 0x00000001cc798fec libsystem_kernel.dylib`mach_msg + 24
[2023-05-04 19:40:42.329731] [STDOUT] [STDOUT] [        ]     frame #4: 0x000000018ee9ead4 CoreFoundation`__CFRunLoopServiceMachPort + 160
[2023-05-04 19:40:42.329792] [STDOUT] [STDOUT] [        ]     frame #5: 0x000000018ee9fd18 CoreFoundation`__CFRunLoopRun + 1232
[2023-05-04 19:40:42.329822] [STDOUT] [STDOUT] [        ]     frame #6: 0x000000018eea4ec0 CoreFoundation`CFRunLoopRunSpecific + 612
[2023-05-04 19:40:42.329863] [STDOUT] [STDOUT] [        ]     frame #7: 0x00000001c8efb368 GraphicsServices`GSEventRunModal + 164
[2023-05-04 19:40:42.329959] [STDOUT] [STDOUT] [        ]     frame #8: 0x000000019139a86c UIKitCore`-[UIApplication _run] + 888
[2023-05-04 19:40:42.330024] [STDOUT] [STDOUT] [        ]     frame #9: 0x000000019139a4d0 UIKitCore`UIApplicationMain + 340
[2023-05-04 19:40:42.330106] [STDOUT] [STDOUT] [        ]     frame #10: 0x0000000102637d9c Runner`main(argc=<unavailable>, argv=<unavailable>) at main.m:10:16 [opt]
[2023-05-04 19:40:42.330138] [STDOUT] [STDOUT] [        ]     frame #11: 0x00000001ad6c6960 dyld`start + 2528
cyanglaz commented 1 year ago

5806 seems to be a timeout caused by libobjc.A.dylib is being read from process memory

[2023-05-03 00:16:17.252475] [STDOUT] [STDOUT] [        ] Application launched on the device. Waiting for Dart VM Service url.
[2023-05-03 00:16:20.574900] [STDOUT] [STDOUT] [+3322 ms] (lldb) warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not find the on-disk shared cache for this device. This will likely reduce debugging performance.
[2023-05-03 00:16:47.255866] [STDOUT] [STDOUT] [   +2 ms] process interrupt

It looks similar to the observatory timeout in https://github.com/flutter/flutter/issues/121231 but with some additional logs.

cc @vashworth @christopherfujino

vashworth commented 1 year ago

@cyanglaz I also saw some (lldb) warning: libobjc.A.dylib is being read from process memory. issues on other test flakes today. I also ran into this issue locally when I updated one of my devices.

You can read a little about it here: https://github.com/flutter/flutter/issues/119147#issuecomment-1404232523

But seems to me that this issue happens when there's something wrong with the symbols. The ios_debug_symbol_doctor step tries to check to see if the xcdevice list info mentions the symbols are being fetched (https://github.com/flutter/cocoon/pull/2421). However, I've found that to be unreliable.

I'm adding this check to help diagnose it: https://github.com/flutter/flutter/pull/126437

As for the Process 534 exited with status = -1 errors, that's tracked in this issue: https://github.com/flutter/flutter/issues/120808

And for Dart VM hanging forever and not being discovered is being tracked in this issue: https://github.com/flutter/flutter/issues/121231

vashworth commented 1 year ago

As for 5848, 5841, 5838, 5801 are app crash on main thread, they're not actually crashes. It's being paused and resumed, if you scroll to the bottom of the stack trace, it should say something like "(lldb) Process 521 resuming" https://github.com/flutter/flutter/blob/a3257ca533205671c90b5c17c6b8cfc3bffce837/packages/flutter_tools/lib/src/ios/devices.dart#L549 So those are instances of the Dart VM discovery hanging forever

cyanglaz commented 1 year ago

So those are instances of the Dart VM discovery hanging forever

Are these also https://github.com/flutter/flutter/issues/121231?

vashworth commented 1 year ago

So those are instances of the Dart VM discovery hanging forever

Are these also #121231?

Yes

cyanglaz commented 1 year ago

from 5680, ext.cocoonRunTask: (-32000) Server error:

Failed at https://github.com/flutter/flutter/blob/master/dev/devicelab/lib/framework/devices.dart#LL303C40-L303C44

ext.cocoonRunTask: (-32000) Server error
Executable "/opt/s/w/ir/x/w/recipe_cleanup/tmpfxas5lcm/flutter sdk/bin/flutter" failed with exit code -6.
#0      fail (package:flutter_devicelab/framework/utils.dart:102:3)
#1      _execute (package:flutter_devicelab/framework/utils.dart:371:5)
<asynchronous suspension>
#2      eval (package:flutter_devicelab/framework/utils.dart:430:3)
<asynchronous suspension>
#3      IosDeviceDiscovery.discoverDevices (package:flutter_devicelab/framework/devices.dart:911:47)
<asynchronous suspension>
#4      IosDeviceDiscovery.chooseWorkingDevice (package:flutter_devicelab/framework/devices.dart:882:41)
<asynchronous suspension>
#5      IosDeviceDiscovery.workingDevice (package:flutter_devicelab/framework/devices.dart:872:7)
<asynchronous suspension>
#6      _TaskRunner._getWorkingDeviceIfAvailable (package:flutter_devicelab/framework/framework.dart:96:14)
<asynchronous suspension>
#7      _TaskRunner.run (package:flutter_devicelab/framework/framework.dart:153:30)
<asynchronous suspension>
#8      new _TaskRunner.<anonymous closure> (package:flutter_devicelab/framework/framework.dart:77:33)
<asynchronous suspension>

#0      new _OutstandingRequest (package:vm_service/src/vm_service.dart:1865:45)
#1      VmService._call (package:vm_service/src/vm_service.dart:2417:21)
#2      VmService.callServiceExtension (package:vm_service/src/vm_service.dart:2388:14)
#3      runTask (package:flutter_devicelab/framework/runner.dart:231:73)
<asynchronous suspension>
#4      rerunTask (package:flutter_devicelab/framework/runner.dart:113:29)
<asynchronous suspension>
#5      runTasks (package:flutter_devicelab/framework/runner.dart:51:16)
<asynchronous suspension>
#6      TestCommand.run (package:flutter_devicelab/command/test.dart:72:5)
<asynchronous suspension>
#7      CommandRunner.runCommand (package:args/command_runner.dart:212:13)
<asynchronous suspension>

[microbenchmarks_ios] Terminating process...

This seems new. cc @christopherfujino

this is tracked at https://github.com/flutter/flutter/issues/121705

christopherfujino commented 1 year ago

from 5680, ext.cocoonRunTask: (-32000) Server error:

Failed at https://github.com/flutter/flutter/blob/master/dev/devicelab/lib/framework/devices.dart#LL303C40-L303C44

ext.cocoonRunTask: (-32000) Server error
Executable "/opt/s/w/ir/x/w/recipe_cleanup/tmpfxas5lcm/flutter sdk/bin/flutter" failed with exit code -6.
#0      fail (package:flutter_devicelab/framework/utils.dart:102:3)
#1      _execute (package:flutter_devicelab/framework/utils.dart:371:5)
<asynchronous suspension>
#2      eval (package:flutter_devicelab/framework/utils.dart:430:3)
<asynchronous suspension>
#3      IosDeviceDiscovery.discoverDevices (package:flutter_devicelab/framework/devices.dart:911:47)
<asynchronous suspension>
#4      IosDeviceDiscovery.chooseWorkingDevice (package:flutter_devicelab/framework/devices.dart:882:41)
<asynchronous suspension>
#5      IosDeviceDiscovery.workingDevice (package:flutter_devicelab/framework/devices.dart:872:7)
<asynchronous suspension>
#6      _TaskRunner._getWorkingDeviceIfAvailable (package:flutter_devicelab/framework/framework.dart:96:14)
<asynchronous suspension>
#7      _TaskRunner.run (package:flutter_devicelab/framework/framework.dart:153:30)
<asynchronous suspension>
#8      new _TaskRunner.<anonymous closure> (package:flutter_devicelab/framework/framework.dart:77:33)
<asynchronous suspension>

#0      new _OutstandingRequest (package:vm_service/src/vm_service.dart:1865:45)
#1      VmService._call (package:vm_service/src/vm_service.dart:2417:21)
#2      VmService.callServiceExtension (package:vm_service/src/vm_service.dart:2388:14)
#3      runTask (package:flutter_devicelab/framework/runner.dart:231:73)
<asynchronous suspension>
#4      rerunTask (package:flutter_devicelab/framework/runner.dart:113:29)
<asynchronous suspension>
#5      runTasks (package:flutter_devicelab/framework/runner.dart:51:16)
<asynchronous suspension>
#6      TestCommand.run (package:flutter_devicelab/command/test.dart:72:5)
<asynchronous suspension>
#7      CommandRunner.runCommand (package:args/command_runner.dart:212:13)
<asynchronous suspension>

[microbenchmarks_ios] Terminating process...

This seems new. cc @christopherfujino

can you link to the exact build you were looking at?

cyanglaz commented 1 year ago

can you link to the exact build you were looking at?

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5860/overview At the end of the log

chinmaygarde commented 1 year ago

@cyanglaz Are we making progress on this one? I am not sure what the severity of this one ought to be. Doesn't seem like a P1 but I may be wrong.

cyanglaz commented 1 year ago

Lowered the priority to p3 as the failure this issue is tracking only happened once.

fluttergithubbot commented 1 year ago

[prod pool] current flaky ratio for the past (up to) 100 commits is 7.14%. Flaky number: 7; total number: 98. One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5991 Commit: https://github.com/flutter/flutter/commit/ea5eddb5a96b9241cd18df42aaa595478416dadf Flaky builds: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5991 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5976 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5969 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5953 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5944 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5943 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5926 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5918

Recent test runs: https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20microbenchmarks_ios

cyanglaz commented 1 year ago

These seem to be dups:

5991, 5976, 5944 are (lldb) warning: libobjc.A.dylib is being read from process memory., tracked at https://github.com/flutter/flutter/issues/121231

5969, 5918 is dart VM discovery hang, tracked at https://github.com/flutter/flutter/issues/121231

cyanglaz commented 1 year ago

5953 couldn't find a simulator:

[2023-05-12 14:29:47.665144] [STDOUT] [STDOUT] [ +141 ms] Exit code 0 from: xcrun simctl list devices booted
[2023-05-12 14:29:47.665235] [STDOUT] [STDOUT] [        ] == Devices ==
[2023-05-12 14:29:47.665265] [STDOUT] [STDOUT]            -- iOS 16.4 --
[2023-05-12 14:29:47.832375] [STDOUT] [STDOUT] [ +166 ms] {
[2023-05-12 14:29:47.832432] [STDOUT] [STDOUT]              "devices" : {
[2023-05-12 14:29:47.832460] [STDOUT] [STDOUT]                "com.apple.CoreSimulator.SimRuntime.iOS-16-4" : [
[2023-05-12 14:29:47.832482] [STDOUT] [STDOUT] 
[2023-05-12 14:29:47.832502] [STDOUT] [STDOUT]                ]
[2023-05-12 14:29:47.832526] [STDOUT] [STDOUT]              }
[2023-05-12 14:29:47.832547] [STDOUT] [STDOUT]            }
[2023-05-12 14:29:49.407737] [STDOUT] Taking screenshot of working device 00008030-00062432346B402E at /opt/s/w/ir/x/w/recipe_cleanup/flutter_logs_dir/device-screenshot-2023-05-12T14:29:49.405182.png
[2023-05-12 14:29:49.494865] [STDOUT] Checking for reboot
[microbenchmarks_ios] Process terminated with exit code 0.
Task result:
{
  "success": false,
  "reason": "Task failed: flutter run failed: exit code=-6"
}

Similar to https://github.com/flutter/flutter/issues/126416.

Opened an infra ticket and tracking similar issues in https://github.com/flutter/flutter/issues/127045

cyanglaz commented 1 year ago

5943 is interrupted after the app is running with no further log:

[2023-05-12 02:01:21.561356] [STDOUT] [STDOUT] [        ] A Dart VM Service on swarming’s iPhone is available at: http://127.0.0.1:52461/h8yTLIH69O8=/
[2023-05-12 02:01:21.791745] [STDOUT] [STDOUT] [ +230 ms] The Flutter DevTools debugger and profiler on swarming’s iPhone is available at: http://127.0.0.1:9101/?uri=http://127.0.0.1:52461/h8yTLIH69O8=/

5926 is interrupted after running the xcode build command with no further log:

[2023-05-10 15:59:39.075465] [STDOUT] [STDOUT] [   +6 ms] Running Xcode build...
[2023-05-10 15:59:39.075929] [STDOUT] [STDOUT] [        ] executing: [/opt/s/w/ir/x/w/recipe_cleanup/tmpx7yno3qx/flutter sdk/dev/benchmarks/microbenchmarks/ios/] xcrun xcodebuild -configuration Profile VERBOSE_SCRIPT_LOGGING=YES -workspace Runner.xcworkspace -scheme Runner BUILD_DIR=/opt/s/w/ir/x/w/recipe_cleanup/tmpx7yno3qx/flutter sdk/dev/benchmarks/microbenchmarks/build/ios -sdk iphoneos -destination id=00008030-001971982633402E ONLY_ACTIVE_ARCH=YES ARCHS=arm64 -resultBundlePath /opt/s/w/ir/x/t/flutter_tools.lBfZl3/flutter_ios_build_temp_dirqEJ7oU/temporary_xcresult_bundle -resultBundleVersion 3 FLUTTER_SUPPRESS_ANALYTICS=true COMPILER_INDEX_STORE_ENABLE=NO PROVISIONING_PROFILE_SPECIFIER=match Development * DEVELOPMENT_TEAM=******* CODE_SIGN_STYLE=Manual
christopherfujino commented 1 year ago

can you link to the exact build you were looking at?

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5860/overview At the end of the log

It looks like this failure is actually https://github.com/flutter/flutter/issues/121705

vashworth commented 1 year ago

5943 is interrupted after the app is running with no further log:

[2023-05-12 02:01:21.561356] [STDOUT] [STDOUT] [        ] A Dart VM Service on swarming’s iPhone is available at: http://127.0.0.1:52461/h8yTLIH69O8=/
[2023-05-12 02:01:21.791745] [STDOUT] [STDOUT] [ +230 ms] The Flutter DevTools debugger and profiler on swarming’s iPhone is available at: http://127.0.0.1:9101/?uri=http://127.0.0.1:52461/h8yTLIH69O8=/

5926 is interrupted after running the xcode build command with no further log:

[2023-05-10 15:59:39.075465] [STDOUT] [STDOUT] [   +6 ms] Running Xcode build...
[2023-05-10 15:59:39.075929] [STDOUT] [STDOUT] [        ] executing: [/opt/s/w/ir/x/w/recipe_cleanup/tmpx7yno3qx/flutter sdk/dev/benchmarks/microbenchmarks/ios/] xcrun xcodebuild -configuration Profile VERBOSE_SCRIPT_LOGGING=YES -workspace Runner.xcworkspace -scheme Runner BUILD_DIR=/opt/s/w/ir/x/w/recipe_cleanup/tmpx7yno3qx/flutter sdk/dev/benchmarks/microbenchmarks/build/ios -sdk iphoneos -destination id=00008030-001971982633402E ONLY_ACTIVE_ARCH=YES ARCHS=arm64 -resultBundlePath /opt/s/w/ir/x/t/flutter_tools.lBfZl3/flutter_ios_build_temp_dirqEJ7oU/temporary_xcresult_bundle -resultBundleVersion 3 FLUTTER_SUPPRESS_ANALYTICS=true COMPILER_INDEX_STORE_ENABLE=NO PROVISIONING_PROFILE_SPECIFIER=match Development * DEVELOPMENT_TEAM=******* CODE_SIGN_STYLE=Manual

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5926/overview has no following logs because it timed out because it ran 31 times 🤯

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5943/overview also timed out after running 33 times.

Does anyone know why it would run that many times?

cyanglaz commented 1 year ago

Oh yeah I believe both ran 30+ times.

vashworth commented 1 year ago

I don't think it's supposed to do that 😅

cyanglaz commented 1 year ago

This benchmark has 20 sub runs: https://github.com/flutter/flutter/blob/master/dev/devicelab/lib/tasks/microbenchmarks.dart#L53

And each of them ran at least twice and the all_elements_bench ran 3 times, so it's planned to run 40+ times.

I'm not sure if it is intended to make each task run twice. If that's the case we might need to separate these tasks into different builds

cyanglaz commented 1 year ago
      ...await runMicrobench('lib/stocks/build_bench.dart'),
      ...await runMicrobench('lib/stocks/layout_bench.dart'),
      ...await runMicrobench('lib/ui/image_bench.dart'),

The above 3 tests are also skipped.

So the benchmark ran through the tasks until lib/stocks/build_bench_profiled.dart and then repeat them again. With the second run through, the first task: lib/foundation/all_elements_bench.dart ran twice consecutively.

@zanderso Do you know what caused this behavior (or is it intentional)? I couldn't find any code that suggests these tasks should be run twice, nor the last 3 tasks should be skipped, nor the lib/foundation/all_elements_bench.dart can be run twice.

vashworth commented 1 year ago

@cyanglaz I think I figured out why it's running twice. In https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5926/overview, it failed the first time

Task result:
 {
   "success": false,
   "reason": "Task failed: flutter run failed: exit code=2"
 }

So it retries https://github.com/flutter/flutter/blob/9657289f4a9ce00b4cc33a1dbd65053c0c6afae3/dev/devicelab/lib/framework/runner.dart#L50 In https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/5943/overview it failed twice so it ran three times total.

vashworth commented 1 year ago

As for these 3,

...await runMicrobench('lib/stocks/build_bench.dart'),
...await runMicrobench('lib/stocks/layout_bench.dart'),
...await runMicrobench('lib/ui/image_bench.dart'),

I don't think they're skipped. I think it failed each time on lib/stocks/build_bench_profiled.dart so it just never reaches them

cyanglaz commented 1 year ago

@vashworth looks like 5926 is also https://github.com/flutter/flutter/issues/120808 as well then.

cyanglaz commented 1 year ago

one of 5943 is:

xcodebuild: error: Existing file at -resultBundlePath "/opt/s/w/ir/x/t/flutter_tools.LkupYt/flutter_ios_build_temp_dirlNr2gY/temporary_xcresult_bundle.xcresult"

It is caused by https://github.com/flutter/engine/pull/41647

Actually, it is the framework, so it is the temp directory not cleaned up after first run failed, The dir is created: here. https://github.com/flutter/flutter/blob/master/packages/flutter_tools/lib/src/ios/mac.dart#L324, we will need to clean up the temp folder properly.

cyanglaz commented 1 year ago

The first failure of 5943 seems weird. It just stopped at xcode build

[2023-05-12 01:46:38.193378] [STDOUT] [STDOUT] [   +6 ms] Running Xcode build...
[2023-05-12 01:46:38.193831] [STDOUT] [STDOUT] [        ] executing: [/opt/s/w/ir/x/w/recipe_cleanup/tmp1qkvuxkd/flutter sdk/dev/benchmarks/microbenchmarks/ios/] xcrun xcodebuild -configuration Profile VERBOSE_SCRIPT_LOGGING=YES -workspace Runner.xcworkspace -scheme Runner BUILD_DIR=/opt/s/w/ir/x/w/recipe_cleanup/tmp1qkvuxkd/flutter sdk/dev/benchmarks/microbenchmarks/build/ios -sdk iphoneos -destination id=00008030-00062432346B402E ONLY_ACTIVE_ARCH=YES ARCHS=arm64 -resultBundlePath /opt/s/w/ir/x/t/flutter_tools.Anxu6h/flutter_ios_build_temp_dirtZgjCF/temporary_xcresult_bundle -resultBundleVersion 3 FLUTTER_SUPPRESS_ANALYTICS=true COMPILER_INDEX_STORE_ENABLE=NO PROVISIONING_PROFILE_SPECIFIER=match Development * DEVELOPMENT_TEAM=***** CODE_SIGN_STYLE=Manual
[2023-05-12 01:46:40.251710] [STDOUT] Taking screenshot of working device 00008030-00062432346B402E at /opt/s/w/ir/x/w/recipe_cleanup/flutter_logs_dir/device-screenshot-2023-05-12T01:46:40.249553.png
[2023-05-12 01:46:40.320932] [STDOUT] Checking for reboot
[microbenchmarks_ios] Process terminated with exit code 0.
Task result:
{
  "success": false,
  "reason": "Task failed: flutter run failed: exit code=-6"
}
cyanglaz commented 1 year ago

Filed https://github.com/flutter/flutter/issues/127119 for https://github.com/flutter/flutter/issues/126412#issuecomment-1552076074

cyanglaz commented 1 year ago

Lower to P3 as the only failure this issue is tracking is https://github.com/flutter/flutter/issues/126412#issuecomment-1552103991 and it only happened once. If this happens again, I will create a separate issue for it.

cyanglaz commented 1 year ago

https://github.com/flutter/flutter/issues/126412#issuecomment-1552103991 might be https://github.com/flutter/flutter/issues/56654. Looking at the code the only way that a xcresult bundle failure can happen after a failure is that the first failure is the concurrent build failure. (https://github.com/flutter/flutter/issues/56654)

I'm not sure how concurrent builds could happen on ci. I'm going to add logs when there is a concurrent failure and watch out for it.

edit: although the log Xcode build failed due to concurrent builds. Retry in 2 seconds is not showing up so I'm not 100% sure.

fluttergithubbot commented 1 year ago

[prod pool] current flaky ratio for the past (up to) 100 commits is 3.03%. Flaky number: 3; total number: 99. One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6097 Commit: https://github.com/flutter/flutter/commit/343718945bcbcc356dbb8b1c51fcd5655dd5bb87 Flaky builds: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6097 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6085 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6030

Recent test runs: https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20microbenchmarks_ios

vashworth commented 1 year ago

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6097 is another instance of https://github.com/flutter/flutter/issues/121231.

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6085 Issue with unzipping Exit code 2 from: unzip -o -q /opt/s/w/ir/x/w/recipe_cleanup/tmpo72ayc2c/flutter sdk/bin/cache/downloads/storage.googleapis.com/flutter_infra_release/flutter/f471b37a214696697326c0a3a94e7a0acce68387/ios-profile/artifacts.zip -d /opt/s/w/ir/x/t/flutter_tools.BhrcVE/flutter_artifacts.zip.tIm2wD

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6030 Issue with libobjc.A.dylib and lldb being slow and since the test runs so many iterations, causes it to timeout.

(lldb) warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not find the on-disk shared cache for this device. This will likely reduce debugging performance

I think long term solution for this test is to break it up so it's not so many iterations.

Since major issues are tracked in other issues, downgrading it.

fluttergithubbot commented 1 year ago

[prod pool] current flaky ratio for the past (up to) 100 commits is 3.00%. Flaky number: 3; total number: 100. One recent flaky example for a same commit: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6221 Commit: https://github.com/flutter/flutter/commit/c23aadc77cc165e7e77c2a18cfd4cccc08ad0e29 Flaky builds: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6221 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6218 https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6151

Recent test runs: https://flutter-dashboard.appspot.com/#/build?taskFilter=Mac_ios%20microbenchmarks_ios

vashworth commented 1 year ago

Going to start using this issue to track this error:

(lldb) warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not find the on-disk shared cache for this device. This will likely reduce debugging performance.

This doesn't usually prevent the test from running or stop it, however, it does make the test much slower. This is problematic especially this test because this test is very long even if it doesn't have any problems.

Some references: https://github.com/flutter/flutter/issues/126588 https://github.com/flutter/flutter/issues/119147

I think a fix could be to detect that this error happened and then delete the corresponding Symbols directory. I think this will fix it because from what I've observed the issue seems to be isolated to flutter-devicelab-mac-2, which leads me to believe it's specific to the host computer, which would make sense if the Symbols were the issue. I also ran into this issue on my computer once and deleting the Symbols and letting Xcode regen it worked for me.

I also think we should consider splitting this test up. It does 20 flutter runs each time and if runs into a problem, it starts over all 20 runs. Perhaps could be made into a drive test, which doesn't rebuild each time (although if it's testing benchmarks of some kind, that might influence the results).

christopherfujino commented 1 year ago

I think a fix could be to detect that this error happened and then delete the corresponding Symbols directory. I think this will fix it because from what I've observed the issue seems to be isolated to flutter-devicelab-mac-2, which leads me to believe it's specific to the host computer, which would make sense if the Symbols were the issue.

What's the symbols directory? Is it a child of $HOME/Library/Developer/Xcode/iOS DeviceSupport?

vashworth commented 1 year ago

Yes, for flutter-devicelab-mac-2, it's /Users/swarming/Library/Developer/Xcode/iOS DeviceSupport/16.2 (20C65) arm64e/Symbols specifically. It can be found in the test logs, it's prefixed with Detected path to iOS debug symbols:

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac_ios%20microbenchmarks_ios/6221/overview

christopherfujino commented 1 year ago

Would it make sense in https://github.com/flutter/cocoon/blob/main/device_doctor/lib/src/ios_debug_symbol_doctor.dart#L107 (which runs every time an ios devicelab test times out) to explicitly delete $HOME/Library/Developer/Xcode/iOS DeviceSupport first, then open Xcode? (right now it just opens Xcode, but maybe the symbols might be present but not the libobjc.A.dylib)

vashworth commented 1 year ago

Yes, that's what I was thinking too. We could just add to it that test regardless of the test output in the short-term. I think long-term it'd be good to do it based on the test log's output for the specific error message rather that only if the test times out. There could be other bots/tests that have the issue but it doesn't surface because it doesn't time out. It surfaces in this test because this test is inherently slow.

It could also be something we try for https://github.com/flutter/flutter/issues/120808

christopherfujino commented 1 year ago

I think long-term it'd be good to do it based on the test log's output for the specific error message rather that only if the test times out.

This is a good idea. The only caveat being you'd probably have to parse the log output from the recipe, which is cumbersome but doable.

christopherfujino commented 1 year ago

I think ideally this logic would live in the devicelab runner, which would have a set of patterns that it would check for in logs, and then somehow send a structured response to the recipe runner, which would then respond in an appropriate way (for one pattern match we might upload a log file to cloud storage, for another we might invoke the ios-debug-symbol-doctor, for a third we might re-run the step).

vashworth commented 1 year ago

I think ideally this logic would live in the devicelab runner

What do you mean by devicelab runner?

That sounds good, sounds like we could also build off it for https://github.com/flutter/flutter/issues/128123

christopherfujino commented 1 year ago

I mean this app: https://github.com/flutter/flutter/blob/master/dev/devicelab/bin/run.dart, which last I checked is what the luci recipes executing to handle the actual tests.

vashworth commented 1 year ago

I see, so you mean it have it do pattern matching with the test itself (well within the process that executes the tests)? When would it examine the logs? As it goes? If it did it at the end and the test timed out, it probably wouldn't have time to run

christopherfujino commented 1 year ago

I see, so you mean it have it do pattern matching with the test itself (well within the process that executes the tests)? When would it examine the logs? As it goes? If it did it at the end and the test timed out, it probably wouldn't have time to run

Ahh, that's a good point I had not considered. I guess we would have to either:

  1. always time out the test a few minutes earlier from within the devicelab runner; or
  2. maintain these patterns in the recipes repo

The downside to option 1 would be how to synchronize the LUCI step timeout with the devicelab runner timeout. The downside to option 2 is needing to maintain these patterns outside of the framework tree. I suspect option 2 is less bad?

@XilaiZhang do we always use branched recipes for release candidate branch builds?

christopherfujino commented 1 year ago

or option 3: pattern match as the logs come out, and fail early if we match.

The downside here would be the risk of false positives (where a "positive" means we see an error log we were checking for) failing actual valid builds.

XilaiZhang commented 1 year ago

@XilaiZhang do we always use branched recipes for release candidate branch builds?

My understanding is yes. But @CaseyHillers could confirm

vashworth commented 1 year ago

Could we have the recipes save the test logs to a file, then call a different command (like cocoon) and pass the file path and any other important info as flags to keep the logic out of the recipes? The command could then return a structured response with what to do next for the recipes to consume and execute?

christopherfujino commented 1 year ago

Could we have the recipes save the test logs to a file, then call a different command (like cocoon) and pass the file path and any other important info as flags to keep the logic out of the recipes? The command could then return a structured response with what to do next for the recipes to consume and execute?

Let me file a feature request, move the discussion there, and then cc Keyong.

christopherfujino commented 1 year ago

Could we have the recipes save the test logs to a file, then call a different command (like cocoon) and pass the file path and any other important info as flags to keep the logic out of the recipes? The command could then return a structured response with what to do next for the recipes to consume and execute?

Let me file a feature request, move the discussion there, and then cc Keyong.

Actually, tracking this in https://github.com/flutter/flutter/issues/128123