dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 19 forks source link

Android emulator not booting completely on Helix queue #1448

Open akoeplinger opened 1 year ago

akoeplinger commented 1 year ago

Build

https://dev.azure.com/dnceng-public/public/_build/results?buildId=472093

Build leg reported

android-x86 Release AllSubsets_Mono

Pull Request

https://github.com/dotnet/runtime/pull/93220

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

 {
    "ErrorMessage" : "Did not detect boot completion variable on device",
    "BuildRetry": false,
    "ErrorPattern": "",
    "ExcludeConsoleLog": false
 }

@dotnet/dnceng

Release Note Category

Additional information about the issue reported

No response

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=472093 Error message validated: Did not detect boot completion variable on device Result validation: :x: Known issue did not match with the provided build. Validation performed at: 11/20/2023 10:43:09 AM UTC

Report

Build Definition Test Pull Request
880213 dotnet/runtime System.Runtime.Extensions.Tests.WorkItemExecution
879107 dotnet/runtime System.Diagnostics.Process.Tests.WorkItemExecution
879242 dotnet/runtime Microsoft.Extensions.Options.SourceGeneration.Unit.Tests.WorkItemExecution
878306 dotnet/runtime Android.Device_Emulator.Aot_System.IO.Stream.Test.WorkItemExecution
873764 dotnet/runtime System.Runtime.Extensions.Tests.WorkItemExecution
872092 dotnet/xharness System.Buffers.Tests-x86.WorkItemExecution dotnet/xharness#1324
869640 dotnet/runtime IntrinsicsInSystemPrivateCoreLib.Tests.WorkItemExecution
869188 dotnet/arcade System.Buffers.Tests-x86.WorkItemExecution dotnet/arcade#15230
863482 dotnet/arcade System.Buffers.Tests-x86.WorkItemExecution dotnet/arcade#15221
863362 dotnet/xharness System.Buffers.Tests-x86.WorkItemExecution
860530 dotnet/runtime System.Private.Xml.Tests.WorkItemExecution
859729 dotnet/runtime System.Net.Http.Unit.Tests.WorkItemExecution
855598 dotnet/runtime Microsoft.NETCore.Platforms.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 4 13
premun commented 1 year ago

fyi @dougbu this seems to be catching cases when Android emulators are not booted properly

akoeplinger commented 1 year ago

Not sure why the result validation doesn't match, do we need to set up something special to monitor the runtime-extra-platforms pipeline?

dougbu commented 1 year ago

This feels very similar to #1383 and #1415. The general theme is the emulator isn't starting as quickly as expected (there's a 5 minute loop checking for boot_completed in the XHarness case) or just isn't started. We haven't made much progress on either issue, partially because only @premun knows much about the emulators and he's busy elsewhere.

the ubuntu.2204.amd64.android.29.open queue is one of many we've had problems with when deploying in our staging environment.

I can see how dotnet/xharness#1106 could help here and suggest we keep an eye on this issue for additional hits.

akoeplinger commented 1 year ago

https://github.com/dotnet/dnceng/issues/1383 should be different since that is about Android devices i.e. there's no emulator to start so if they report not booted the device is usually hosed.

If the emulator issue is really about not starting fast enough I think I'd be happy if you add a sleep 5min into the VM provisioning as a quick workaround.

AlitzelMendez commented 1 year ago

Not sure why the result validation doesn't match, do we need to set up something special to monitor the runtime-extra-platforms pipeline?

for this particular question the problem is not the runtime-extra-platforms (we are analyzing it), it is a problem on our side when there are helix work items internal retries, we are not analyzing the logs of all the attempts, created an issue for this: https://github.com/dotnet/dnceng/issues/1467

dougbu commented 1 year ago

I'd be happy if you add a sleep 5min into the VM provisioning as a quick workaround.

there's a 5 minute loop just prior to the failing sys.boot_completed search in the function starting at https://github.com/dotnet/xharness/blob/38841f0f33ca713ca5d6388c681bdd911425b488/src/Microsoft.DotNet.XHarness.Android/AdbRunner.cs#L191

personally, I'm nervous about adding Thread.Sleep(...) in that code b/c @premun seemed confident my similar actions for #1415 (where I extended a loop searching for a different readiness signal) were unhelpful. we found that "fix" only reduced the likelihood of our validation failures; a build soon after my fix went in failed again and we (temporarily❔) gave up

if someone understands dotnet/xharness better, please chime in❕

dougbu commented 10 months ago

@akoeplinger how are things going w/ your fix attempt(s)❔

akoeplinger commented 10 months ago

I just came back from vacation, will take another stab at this early next week :)