docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.43k stars 116 forks source link

Intermittent failures with certain amd64 images when using > 1 CPU (Apple M1) #6016

Closed lyuanxi closed 2 years ago

lyuanxi commented 2 years ago

Expected behavior

Able to reliably use amd64 images (such as flyway) with multiple CPU (More than 1) configured in Docker Preferences - Resources. I've noticed this issue on at least flyway but there may be others.

Actual behavior

With CPU set to 2 or more, images that would otherwise work normally start exhibiting intermittent failures such as randomly "hanging" until forced killed or ocassionally with a qemu error. My use case is when using flyway migration to execute sql statements on a postgres database (Also using Docker)

Information

Steps to reproduce the behavior

My project uses these 2 images (amd64)

We run a postgres database and use flyway for migrations. There are over 500 migration scripts currently. When Docker Preference - Resources is set to 2 CPU or more, one of these occur:

  1. It runs successfully without issues
  2. It fails to attach flyway and "hangs" and needs to be force killed
  3. Flyway starts running but intermittently "hangs" at random points, no apparent pattern. Needs to be force killed.
  4. It crashes with the following error:
    flyway-dev | qemu: uncaught target signal 5 (Trace/breakpoint trap) - core dumped
    flyway-dev | ./flyway: line 83: 50 Trace/breakpoint trap "$JAVA_CMD" $JAVA_ARGS $EXTRA_ARGS -cp "$CP" org.flywaydb.commandline.Main "$@"

When CPU is configured to be 1, the above issues no longer occur. This is not ideal because performance is impacted. This highlights a multi-CPU issue of sorts when running amd64 images on Apple Chips which is inconvenient when arm64 images are not available.

Other things of note: Flyway currently does not officially support a arm64 images however when I built my own arm64 image off the official repository (https://github.com/flyway/flyway-docker) with docker buildx build --platform=linux/arm64 . and use that instead. The above multi-CPU issues appear to be resolved.

anciltech commented 2 years ago

I think I was having a similar issue, while running TouchTerrain (Jupyter project) after updating docker to 4.1.1 (Not totally sure which version I had before, but can confirm 4.0.0 works) I would receive the following:

Operation not permitted (src/thread.cpp:309) qemu: uncaught target signal 6 (Aborted) - core dumped Aborted Operation not permitted (src/thread.cpp:309) Operation not permitted (src/thread.cpp:309) qemu: uncaught target signal 6 (Aborted) - core dumped

After downgrading Docker, these errors do not appear anymore. M1 Mac running Monterey Beta 21A5522h

clintoncampbell commented 2 years ago

I also ran into an issue that was resolved by decreasing to a single CPU while attempting to install Coldfusion 2018 on the Amazon Linux 2 container image.

In my case, the installer would stall with no further output or logging and would be unresponsive to further input. I had conducted limited tracing when I found this issue, but I can confirm that the stalled process was the Java runtime.

I am also running Docker 4.1.1 on macOS Monterey with an M1 Pro. Happy to provide further information if it supports debugging.

longwa commented 2 years ago

I believe QEMU is using the "Force Multicore" option which causes unreliable behavior on the M1.

Unfortunately, for x86 on ARM, the only option is to run all of the emulated cores on a single host core.

For x86 on ARM, Docker Desktop should disable the Force Multicore option in QEMU automatically to fix this.

xoxwgys56 commented 2 years ago

@longwa Could you share how to set single core for QUEMU?

I thought you said about docker desktop. so I set 1 Cpu on resources tab. but it did not work.

longwa commented 2 years ago

Setting to 1 core in docker desktop should work around the issue assuming they don't have any other QEMU settings that are causing problems. Some software (such as Oracle db for instance) still doesn't seem to work but it does fix the random hangs I was seeing with a few things (running Java, for instance).

I think the docker desktop team will need to modify their code to change the settings for QEMU. I'm assuming they are using the Force Multicore option since I can see Docker Desktop using more than 100% CPU on my M1.

docker-robott commented 2 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

Jdban commented 2 years ago

Does anyone have a good solution to this? I'm still having the issue when docker is set to 1 CPU.

jedisct1 commented 2 years ago

@Jdban I eventually switched to using Lima (via colima).

The switch was painless and it works perfectly fine on my M1.

Jdban commented 2 years ago

@Jdban I eventually switched to using Lima (via colima).

The switch was painless and it works perfectly fine on my M1.

Thanks for the suggestion @jedisct1. I tried out colima and it has been super useful. I had to make some changes to the lima override.yaml for my setup but it worked out.

I'm still getting VERY intermittent hangs, so hopefully I can figure out what's going on in the future. Today I had 214 docker containers start up and complete before the 215th hung and had to be killed. Definitely odd, but I can work with that I guess.

Thanks!

docker-robott commented 2 years ago

Closed issues are locked after 30 days of inactivity. This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle locked