ReactiveCircus / android-emulator-runner

A GitHub Action for installing, configuring and running hardware-accelerated Android Emulators on macOS virtual machines.
Apache License 2.0
942 stars 184 forks source link

Runner action hangs after killing emulator with stop: not implemented #385

Open ericswpark opened 5 months ago

ericswpark commented 5 months ago

When running my GitHub Actions workflow, the emulator runner action hangs after the emulator is killed, with the following log output:


Terminate Emulator
  /usr/local/lib/android/sdk/platform-tools/adb -s emulator-5554 emu kill
  OK: killing emulator, bye bye
  OK
  INFO    | Wait for emulator (pid 4126) 20 seconds to shutdown gracefully before kill;you can set environment variable ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL(in seconds) to change the default value (20 seconds)
INFO    | Discarding the changed state: command-line flag
WARNING | Discarding the changed state (command-line flag).
ERROR   | stop: Not implemented
grodin commented 5 months ago

This seems to be related to #381. I debugged my workflow run with tmate which similarly had two crashpad_handler processes running and the same logs as in the issue.

Terminating the two crashpad_handler processes with SIGTERM allowed the android-emulator-runner step to complete.

ericswpark commented 5 months ago

Seems like there should be a step at the end that kill -9s all the crashpad_handler processes.

Either that or disable crashpad_handler from running in the first place (I'm guessing it's some sort of error reporting mechanism from Google to report errors with the Android emulator?)

grodin commented 5 months ago

So far I've managed to find out that crashpad-handler is the daemon part of crashpad, a crash reporter.

I've done some digging in the emulator source repo. It seems the emulator uses crashpad to report crashes back to Google, so we're not going to be able to prevent the crashpad-handler processes getting started.

Killing them after the emulator has shutdown seems like a reasonable workaround, but we should do that with SIGTERM first! Going straight to kill -KILL is a bit, er, overkill. I can't immediately think of any harm, given that the VM the action is running in will be thrown away soon, but it's generally not recommended to use SIGKILL until necessary.

I'm fairly certain that this will need to be done as part of the action though.

I suspect, but haven't confirmed, that the root cause of this is that when the emulator is told to shutdown, some node.js code sends a signal to the emulator, but then waits for the whole process tree to end, not just the emulator process. However, if the emulator doesn't pass on signals to it's child processes, they won't receive any signal telling them to quit, so the waiting will go on forever.

If I'm correct (will try to find out this week), trying to kill the extra processes outside of the action won't work, since that code will be waiting for the action to finish to have a chance to run. It's a classic deadlock effectively.

benszedlmayer commented 2 months ago

Are there any updates here? My log output is exactly same as in the original issue, it completes all processes then just hangs forever. Trying to kill the crashpad process with kill -9 $(pgrep -f crashpad_handler) fails.

troZee commented 2 months ago

I have a similar issue here: https://github.com/callstack/react-native-pager-view/actions/runs/9576519647/job/26403174451?pr=829 . Does anyone know how to fix it?

limpbrains commented 2 months ago

I have the same issue. Emulator just hangs forever https://github.com/synonymdev/react-native-ldk/actions/runs/9724215820/job/26840196419?pr=251

mustalk commented 1 month ago

For anyone facing this issue, I've discovered that setting the environment variable ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL: 60 helps sometimes, but it's inconsistent.

In my case, the issue seemed to be affected by the emulator saved snapshot, so not saving the AVD snapshot should help as well.

If none of the previous solutions work, as @grodin mentioned regarding the crashpad_handler, you can use the following steps to terminate the processes:

- name: Kill crashpad_handler processes
  if: always()
  run: |
    pkill -SIGTERM crashpad_handler || true
    sleep 5
    pkill -SIGKILL crashpad_handler || true

This should definitely stop the hang issue.

ericswpark commented 1 month ago

@mustalk are you sure that step will work? My understanding is that the previous step will hang and stop execution of that step that will kill crashpad_handler.

mustalk commented 1 month ago

@ericswpark that's what i thought at first too, but to my surprise it did execute, even without the if: always(), at least in my setup, give it a try.

fernando-jascovich commented 1 month ago

I was having this issue while running manually android emulator (I'm not using android-emulator-runner). And looking for answers I came here. After that I discovered the solution. You'll need to kill android emulator's qemu process with SIGSTOP. For example:

# Being XXXXXX pid for android sdk qemu-system process
kill STOP XXXXXXX

That will handle snapshot generation and crashpad_handler as expected and emulator will end successfully

ashishb commented 1 month ago

@mustalk your suggestion didn't work for me https://github.com/ashishb/adb-enhanced/actions/runs/10024919828/job/27707518728?pr=246, it is stuck at the emulator execution step for me

Strangely, it only impacts API 26 and 29 though for me.

Bhuvanaarkala07 commented 1 month ago

Hi Team,

We are also facing similar kind of issue. It was working 2 days back, but suddenly stops failing with below error,

Screenshot 2024-07-24 at 11 34 52 PM

Script what we are using is, runs-on: macos-13 timeout-minutes: 25

Can some one suggest what is wrong here?

ashishb commented 1 month ago

Fixes that worked for me was to use macos-latest instead

  1. https://github.com/ashishb/adb-enhanced/pull/246
  2. https://github.com/ashishb/adb-enhanced/pull/248
Bhuvanaarkala07 commented 1 month ago

We are already using runs-on: macos-13 , but still shwoing above error.

Braggiouy commented 1 month ago

I am facing the same issue. The step does not terminate the emulator, and it stays stuck in the step. I tried @mustalk suggestion, but the workflow is not able to reach the step where it kills the crashpad_handler.

     - name: Set up the Android emulator and run tests
        uses: reactivecircus/android-emulator-runner@v2
        with:
          api-level: 33
          target: google_apis_playstore
          arch: x86_64
          emulator-boot-timeout: 600
          disable-animations: true
          script: ./scripts/run-tests.sh

Added process termination commands within the custom script ./scripts/run-tests.sh, but still no success. This script runs the Appium testing that I have integrated.

In addition, I have :

runs-on: ubuntu-latest
uses: reactivecircus/android-emulator-runner@v2

Context and Background

Emulator Running Android emulators on GitHub Actions can be challenging due to the lack of KVM support on ubuntu-latest.

macOS vs. Ubuntu The macos-latest runner includes pre-installed Android SDKs and better support for Android emulation. However, using macos-latest is more expensive compared to ubuntu-latest. Setting up a self-hosted macOS runner might be a cost-effective solution, but, I would like to try with the Ubuntu image first, if possible.

Hardware Acceleration Starting on February 23, 2023, GitHub Actions users can leverage hardware acceleration on larger Linux runners, significantly improving Android emulator performance. This requires adding the runner user to the KVM user group:

- name: Enable KVM group perms
  run: |
    echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /etc/udev/rules.d/99-kvm4all.rules
    sudo udevadm control --reload-rules
    sudo udevadm trigger --name-match=kvm

Questions:

Any suggestions or guidance on resolving this issue would be greatly appreciated. Specifically, I need help ensuring the emulator terminates properly, and the workflow can proceed without getting stuck.

I read earlier that this could be fixed by using the macos-latest runner. Is there any possibility to fix this by using the ubuntu-latest one ?

vaind commented 4 weeks ago

I've faced the same issue and was able to resolve it by making sure the appium instance I created in the test script gets shut down properly by the end of the script (cc @Braggiouy).

you can check pgrep -f appium before and after your script execution

Braggiouy commented 4 weeks ago

Thanks a million @vaind. That was indeed my issue. Seems that the appium instance was still running in the background, not allowing the Emulator to shut down properly. No need to manually kill the crashpad_handler. Good catch !