flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
164.37k stars 27.12k forks source link

Linux linux_android_emulator_tests failing on missing Android emulator #137947

Closed zanderso closed 1 month ago

zanderso commented 10 months ago

Linux linux_android_emulator_tests is flaking a few times a day in the prod pool with the following error:

Running command "adb push /b/s/w/ir/cache/builder/src/out/android_debug_x64/flutter_shell_native_unittests /data/local/tmp"
adb: error: failed to get feature set: no devices/emulators found

Here's an example.

https://ci.chromium.org/ui/p/flutter/builders/prod/Linux%20linux_android_emulator_tests/3003/overview

Wondering if there is a caching issue or some problem when the machiens are refreshed once per day?

ricardoamador commented 10 months ago

Hmm... taking a look now.

ricardoamador commented 9 months ago

Just updating as I am planning to get back to this but several other things have come up.

ricardoamador commented 9 months ago

Okay I see a couple of issues. The main reason the emulator cannot be found is that something is killing it and it is not coming back up. We do a wait in a while loop to determine if the emulator is back and that is timing out because the emulator cannot be found as it did not come back.

The other issues are test issues specifically with building artifacts or the test not running correctly. In all of those cases the emulator start just find and did not report an issue. I will be modifying the startup code for the emulator so that it does not wait forever.

That being said I took a look back at tasks that have run for the Linux linux_android_emulator_tests and found that there are multiple types of failures.

In the 12 most recent failures only 3 of those were failures due to an emulator that could not be found (where we busy waited for it to come back). I also saw that for some reason we are setting up, starting, testing, stopping and killing the emulator multiple times during the same test. I have to wonder why we would need to do that?

Run number Run Link Reason for failure
3103 https://ci.chromium.org/ui/b/8764870860746045585 Failed due to emulator - avd setup busy wait
3073 https://ci.chromium.org/ui/b/8764957781725760385 Test failure - Android unit tests failed and caused emulator setup to fail
3071 https://ci.chromium.org/ui/b/8764959463037565089 Test failure - Android unit tests failed and caused emulator setup to fail
3000 https://ci.chromium.org/ui/b/8765177627119979825 Failed due to emulator - - avd setup busy wait
2977 https://ci.chromium.org/ui/b/8765426993012937745 Test failed due to timeout
2955 https://ci.chromium.org/ui/b/8765502419743357345 Test failed - failed to build app
2947 https://ci.chromium.org/ui/b/8765515175339359233 Test failed due to timeout
2940 https://ci.chromium.org/ui/b/8765527849984564833 Failed due to emulator - - avd setup busy wait
2926 https://ci.chromium.org/ui/b/8765578557211696145 Test failed due to timeout
2917 https://ci.chromium.org/ui/b/8765614122556212689 Test failed due to timeout
2907 https://ci.chromium.org/ui/b/8765655491889356017 Test failed due to timeout

Other failures:

image

image

image

I suspect that something in this order of operations is killing the emulator in the cases where it is not available.

zanderso commented 9 months ago

@dnfield @jason-simmons IIRC you were investigating the behavior of the scenario app tests on Android recently. Do you have any insights into why it might be timing out in the examples above?

@reidbaker Some of the above failures are gradle failures that I'm not able to parse. Do they make any sense to you?

@ricardoamador 3071 is a failure to push the test binary to the device, so it might be related to the failures in which the emulator hasn't come up correctly.

ricardoamador commented 9 months ago

@zanderso Gotcha. I have modified the setup slightly to not run for an hour and to fail fast so we won't be wasting time there. I have also added some more significant logging from the infra side to see if we can see why the emulator is dying.

reidbaker commented 9 months ago

Digging into the failure on 2940 and 3103. 3103 Shows an emulator that matches the code
2940 has nothing in the standard out logs https://ci.chromium.org/ui/p/flutter/builders/prod/Linux%20linux_android_emulator_tests/2940/overview

reidbaker commented 9 months ago

Looking at 3073 I see that we failed to download android dependencies. Lot of errors in the style of Could not get resource 'https://dl.google.com/dl/android/maven2/com/android/tools/build/aapt2/7.4.2-8841542/aapt2-7.4.2-8841542-linux.jar' I believe this is a duplicate of https://github.com/flutter/flutter/issues/120119

3071 is a adb failure nothing in the details or logs give an indication what went wrong.

adb push /b/s/w/ir/cache/builder/src/out/android_debug_x64/flutter_shell_native_unittests /data/local/tmp

Exit Code: 1
reidbaker commented 9 months ago

2977, 2947, 2926 how did you determine that this was a failure to launch the emulator? From what I see some android device is running and we are getting logs.

ricardoamador commented 9 months ago

It is hanging on the setup phase. Currently it uses a while loop through adb shell to look for a parameter to tell us the phone is booted and ready. It is stuck there until LUCI times out the test.

Disregard my last message. @reidbaker was that last question directed to me? If so, those tests do not have an emulator failure. I marked them as "test failure - no emulator failure"

I updated the table as the wording I used somewhat confusing.

reidbaker commented 9 months ago

Ok that makes sense but there is still a weird failure going on there. Because the step has basically no failure logs but does not continue to even run the next steps.

ricardoamador commented 9 months ago

Yeah, there are at least 3 different failure conditions in this test right now. So far I have only been able to get the unit test failure error: https://ci.chromium.org/ui/p/flutter/builders/try.shadow/Linux%20linux_android_emulator_tests/7/overview

ricardoamador commented 9 months ago

Change to fail fast for emulator is merged. Removing from ticket queue due to priority but am currently monitoring and will continue to support this.

ricardoamador commented 9 months ago

@zanderso I am not sure who to reassign this to for test investigation so I just assigned it back to you for reassingment.

flutter-triage-bot[bot] commented 9 months ago

Issue is assigned to multiple teams (android, engine). Please ensure the issue has only one team-* label at a time. Use fyi-* labels to have another team look at the issue without reassigning it.

zanderso commented 9 months ago

Over chat, @ricardoamador and @godofredoc mentioned that there could be a timing issue around emulator startup to investigate, so I'll re-label this a bit.

ricardoamador commented 9 months ago

Add another check on some internal flags in the emulator. Chrome team recommends going to api 33 as api 34 is new and may have bugs.

zanderso commented 9 months ago

FYI @reidbaker and @camsim99 on the note about 33 vs 34.

reidbaker commented 9 months ago

That is not nearly specific enough to be compelling. We should be testing on the newest public android api.

ricardoamador commented 9 months ago

@reidbaker We can run api34 in staging and api33 in prod, is that doable from your end or do you want to run everything in prod on api34?

reidbaker commented 9 months ago

I am not confident enough in my understanding of that difference to make the call. Can you grab me for 15m and have us talk it out. In general though I believe we need to be running our test infra against the latest android version.

ricardoamador commented 9 months ago

Okay I spoke with Reid and we are going to leave this as is. I took another look and have not seen the failed emulator error but did see a new error with the change I added to look at another boot config and found the following error:

VERBOSE | _qemudPipe_wakeOn: -> 2
E 21:41:41.578  346.166s TimeoutThread-1-for-MainThread  Timeout on adb command: ['/b/s/w/ir/cache/avd/src/third_party/android_sdk/public/platform-tools/adb', '-s', 'emulator-5554', 'shell', '( getprop sys.boot_completed ) 2>&1;echo %$?']
Traceback (most recent call last):
  File "/b/s/w/ir/cache/avd/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 504, in GetCmdStatusAndOutputWithTimeout
    for data in _IterProcessStdout(process, timeout=timeout):
  File "/b/s/w/ir/cache/avd/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 342, in _IterProcessStdoutFcntl
    raise TimeoutError()
devil.utils.cmd_helper.TimeoutError: Timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/b/s/w/ir/cache/avd/src/third_party/catapult/devil/devil/android/sdk/adb_wrapper.py", line 636, in _RunAdbCmd
    status, output = cmd_helper.GetCmdStatusAndOutputWithTimeout(adb_cmd,
  File "/b/s/w/ir/cache/avd/src/third_party/catapult/devil/devil/utils/cmd_helper.py", line 509, in GetCmdStatusAndOutputWithTimeout
    raise TimeoutError(output.getvalue())
devil.utils.cmd_helper.TimeoutError: Timeout

This might be due to the API level being 28. I am looking to see if this is present in that api level.

camsim99 commented 9 months ago

After talking with @ricardoamador, we decided to add a test target to have these tests run on AVDs running Android 33 (in addition to 34 as they already are) so that we may be able to observe any potential differences and escalate any evidence internally if needed. PR linked above.

camsim99 commented 8 months ago

Looks like both are running now, so we should be able to start observing for any differences: https://flutter-dashboard.appspot.com/#/build?repo=engine&branch=main&taskFilter=linux_android_emulator

ricardoamador commented 8 months ago

Awesome! thank you Camille. I will take a look later today.

camsim99 commented 8 months ago

I spoke too soon it seems...https://github.com/flutter/engine/pull/48936#issuecomment-1854461726 happened so it will be reverted. I will have to reland it and try to figure out why it passed in presubmit but fails once I land the PR.

ricardoamador commented 8 months ago

No problem. Looks like you need to add this - DEPS flag to the runIf.

camsim99 commented 8 months ago

No problem. Looks like you need to add this - DEPS flag to the runIf.

Seems as though this may have been a larger engine problem (https://github.com/flutter/engine/pull/48936#issuecomment-1854468113). Going to re-land that PR then.

ricardoamador commented 8 months ago

@camsim99 were you able to land your change?

ricardoamador commented 8 months ago

Just noticing that this is no longer flaking a few times per day which is good. Have some time and am taking a look.

ricardoamador commented 8 months ago

Odd does anyone know when the api level was changed from 34 to 28? nvm. I see it runs two tests.

camsim99 commented 8 months ago

@camsim99 were you able to land your change?

No; I just requested your review on https://github.com/flutter/engine/pull/49101 so we can re-land it :)

ricardoamador commented 8 months ago

Update: I have updated the package version to a newer version, the one we had was a couple of months old. I also added a change to the naming convention to pull the updated versions of the emulator packages as they are now being named a different way, this allows us to consume the newer updated packages with the new naming convention. I have not seen the missing emulator issue but have seen where the emulator was offline (but still found) and multiple timeouts waiting for builds to complete (perhaps sharding would help with the timeout).

Also wanted to add this info from the chromium team; when I asked about a timeout for the startup process of the emulator and if it is configurable and got this response: "The startup timeout is not configurable so far, but it is 6 mins in this case, which should usually be sufficient"

ricardoamador commented 8 months ago

Also I just approved your pull request @camsim99. Sorry I was out on vacation for the last two weeks. Please go ahead and merge when you have the chance.

ricardoamador commented 8 months ago

Looking further at the differences in pass fail it looks like there may something happening internal to when the test runs but I am not sure since the configuration that occurs does not appear to be from the infra side:

There is a step in the config before the test runs that is attempting to 'reverse port' that appears to be where the failure is occurring. A failed run looks like this: https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8759996397862179521/+/u/test:_Scenario_App_Integration_Tests/stdout

Got dependencies!
-> Starting server...
listening on host 0.0.0.0:3001
<- Done
-> Creating screenshot directory...
<- Done
-> Starting logcat...
<- Done
-> Configuring emulator...
<- Done
-> Get API level of connected device...
using API level 34
<- Done
-> Skia Gold auth...
skia gold client is unavailable
<- Done
-> Reverse port...
-> Symbolize stack traces
Terminated
<- Done
-> Dump full logcat
[stdout] --------- beginning of main
[stdout] 01-02 16:48:05.649  1696  1696 I KeyboardViewUtil: KeyboardViewUtil.getKeyboardHeightRatio():189 systemKeyboardHeightRatio:1.000000; userKeyboardHeightRatio:1.000000.
[stdout] 01-02 16:48:05.656   546   655 I InputReader: Reconfiguring input devices, changes=KEYBOARD_LAYOUTS
[stdout] --------- beginning of kernel

A passing run looks to print the port number as confirmation of the successful setup: https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8759995205036656225/+/u/test:_Scenario_App_Integration_Tests/stdout

Got dependencies!
-> Starting server...
listening on host 0.0.0.0:3001
<- Done
-> Creating screenshot directory...
<- Done
-> Starting logcat...
<- Done
-> Configuring emulator...
<- Done
-> Get API level of connected device...
using API level 34
<- Done
-> Skia Gold auth...
skia gold client is unavailable
<- Done
-> Reverse port...
[stdout] 3000
<- Done
-> Installing app APK...
[stdout] Performing Streamed Install
[stdout] Success
<- Done
-> Installing test APK...
[stdout] Performing Streamed Install
[stdout] Success
<- Done
-> Running instrumented tests...
client connected 127.0.0.1:37267
[stdout] 
[stdout] dev.flutter.scenarios.EngineLaunchE2ETest:
ricardoamador commented 7 months ago

I have created the following issue with the Android Studio team and reached out to the owner Bo Hu to take a look. https://b.corp.google.com/issues/319321211 - Waiting for a response from them and will provide info as needed.

ricardoamador commented 7 months ago

Making this P1 as I have been working on this issue with Chromium and Android Studio.

godofredoc commented 7 months ago

Trying to replace adb wait-for-device with adb shell true with a timeout. https://flutter-review.googlesource.com/c/recipes/+/54686

ricardoamador commented 7 months ago

Linking with @camsim99 issue: https://github.com/flutter/flutter/issues/140001, I suspect they are both the same issue with the emulator crashing though they are happening at very different times. I had initially thought this was caused by a stop and start but the issue reported by Camille is happening during an uninstall of the app.

gmackall commented 6 months ago

Two more instances of this on https://github.com/flutter/flutter/issues/143063 Links to the logs: https://ci.chromium.org/ui/p/flutter/builders/prod/Linux_android_emu%20android%20views/66/overview https://ci.chromium.org/ui/p/flutter/builders/prod/Linux_android_emu%20android%20views/61/overview

camsim99 commented 5 months ago

Are there any new updates on this issue? Going to start looking into https://github.com/flutter/flutter/issues/140001 to get webview_flutter tests running on Android 34 emulators.

matanlurey commented 5 months ago

Are there any new updates on this issue? Going to start looking into #140001 to get webview_flutter tests running on Android 34 emulators.

Fwiw the flutter/engine has emulators working on Android 34 emulators on CI. Examples:

I'll leave this issue open for @ricardoamador - he might still be working on other changes, but it's possible you have enough to get started. If that's the case, I'd be happy to help @camsim99. Wanna chat for a few minutes tomorrow?

camsim99 commented 5 months ago

@matanlurey Sure! Would definitely appreciate another set of eyes :)

ricardoamador commented 5 months ago

@camsim99 as @matanlurey it is working. We have a work around for the missing emulator but I suspect your issue may be a little different. It would be good to have Matan look at this to see if he notices anything too. I will make sure to follow up with Chromium to see when they are going to add the crashdump support though.

camsim99 commented 5 months ago

@camsim99 as @matanlurey it is working. We have a work around for the missing emulator but I suspect your issue may be a little different. It would be good to have Matan look at this to see if he notices anything too. I will make sure to follow up with Chromium to see when they are going to add the crashdump support though.

Gotcha, that makes sense. Thank you!

flutter-triage-bot[bot] commented 1 month ago

This issue is assigned to @ricardoamador but has had no recent status updates. Please consider unassigning this issue if it is not going to be addressed in the near future. This allows people to have a clearer picture of what work is actually planned. Thanks!

zanderso commented 1 month ago

I have not seen this flake for the past few months, so I'm going to close this issue as stale. Please feel free to reopen if I'm mistaken, and this is still actionable.

github-actions[bot] commented 4 weeks ago

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.