Closed fumoboy007 closed 7 months ago
Oops, the QEMU functionality in QEMU 7.2 hasn’t been released yet. 😅
Very important issue for my team.
The latest Docker for Mac release is apparently still using QEMU 6.2.0:
# /containers/services/binfmt/rootfs/usr/bin/qemu-x86_64 --version
qemu-x86_64 version 6.2.0 (v6.2.0)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
Looks like QEMU was upgraded to 7.0.0 in Docker Desktop 4.13.0 but was downgraded to 6.2.0 in Docker Desktop 4.13.1 due to some other issue.
Is anyone working on trying again to upgrade QEMU? 🥺
any news on this? I'm struggling with the avx2
instruction set
any news on this? I'm struggling with the
avx2
instruction set
^ @stephen-turner who previously commented on #5148.
Negative news: I tested the recent Rosetta 2 support in Docker Desktop but Rosetta 2 does not seem to support AVX either.
tried on Debian 11. rocket.chat container exiting (132)
Until this is fixed and docker is upgraded to use qemu 7.2+ (latest is 8.0.0), one could try run qemu/colima directly. you will still build and run as usual after stopping docker and having started colima. This works for me for my projects.
brew install qemu
brew install colima
colima start --arch x86_64 --cpu 8 --memory 24 --disk 128 --cpu-type Broadwell-v4
for other cpu models: https://qemu.readthedocs.io/en/latest/system/qemu-cpu-models.html
I would not hold my breath for Rosetta support, it's not going to happen. https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment
Any update on this issue? With Apple Silicon now being a staple in a lot of engineering departments were facing the same issues here.
Any update here? Pain ongoing...
Apple Silicon combined with QEMU above 7.0 causes a regression where the syscall prctl(PR_SET_CHILD_SUBREAPER, 1)
will return Invalid Argument, which some applications rely on not to happen (e.g. astro-cli
, spark-on-k8s-operator
, cinit
).
Specifically, whereas QEMU 6.2 and under passed on the syscall without modification, QEMU 7.0 and above disables it and put a comment saying "TODO to implement a safe pass-through for it". https://gitlab.com/qemu-project/qemu/-/commit/220717a6f46a99031a5b1af964bbf4dec1310440
And it's still not implemented to this day, which means nothing above QEMU 6.2 will work for those applications. Until that is fixed, I think a QEMU update will cause unexpected regressions to Docker users.
Great find, @yutotakano!
Apple Silicon combined with QEMU above 7.0 causes a regression
Dumb question: Is this issue specific to Apple Silicon? At first glance, the commit you linked doesn’t seem to depend on architecture?
And it's still not implemented to this day
Do you know if there is a QEMU ticket tracking this issue? If not, I think we should create one so that the QEMU developers don’t forget about it!
Dumb question: Is this issue specific to Apple Silicon? At first glance, the commit you linked doesn’t seem to depend on architecture?
Hmm. I'm certainly on an Apple Silicon so I decided to keep my assumptions small. Perhaps it's on all devices as long as you use QEMU to emulate Linux. But would Docker use QEMU if it's running an x86 container on Intel x86?
Related but somewhat off-topic, because MongoDB 5.0 and later relies on the AVX instruction set, among other tools mongosh
crashes on a QEMU-emulated x86 container running on Docker Desktop for Mac with Apple silicon.
This incompatibility has a bad effect on containerized development & testing environment setup.
Side note: If the said container is rebuild for ARM architecture, mongosh starts to work just fine.
https://www.mongodb.com/docs/v7.0/administration/production-notes/#x86_64
MongoDB 5.0 requires use of the AVX instruction set, available on select Intel and AMD processors.
Do you know if there is a QEMU ticket tracking this issue? If not, I think we should create one so that the QEMU developers don’t forget about it!
I took the liberty of creating an issue in the QEMU tracker since I did not find an existing one: https://gitlab.com/qemu-project/qemu/-/issues/1929
Hello everyone, we're updating QEMU in the upcoming version of Docker Desktop. Have you tested a version that would suite your needs?
@dgageot Good news! I have not tested but in theory, QEMU 7.2 or above should resolve this issue.
@dgageot Good news! I have not tested but in theory, QEMU 7.2 or above should resolve this issue.
Thank you @fumoboy007. Probably Docker Desktop 4.26.0 will contain a more recent QEMU but not 7.2 yet. But I'll still do my best to fit it in and if it doesn't work, I'll target 4.27.0.
@fumoboy007 Sorry, that'll have to wait for 4.27.0.
@fumoboy007 do you have an example of a docker command that fails with the latest version of Docker Desktop?
@dgageot One Docker image that is affected by this issue is tensorflow/serving
. https://github.com/tensorflow/serving/issues/1948#issue-1075115038 has reproduction steps.
@dgageot it would be awesome seeing it coming for 4.27.0 or early 2024 :-) I'm also blocked by https://github.com/tensorflow/serving/issues/1948
Do you have any updates?
@dgageot it would be awesome seeing it coming for 4.27.0 or early 2024 :-) I'm also blocked by tensorflow/serving#1948
Do you have any updates?
We currently have a QEMU 8.0.4
on our main branch, with a patch for the prctl(PR_SET_CHILD_SUBREAPER, 1)
issue. So the good news is that we are not stuck with a very old version of QEMU anymore and 4.27.0 will at least contains this version of qemu.
Also, this morning, I've started testing 8.1.4
with the plan to soon test 8.2.0
.
I mainly focused on using the most recent versions of qemu. I didn't test the support for AVX, yet. Do you have a simple docker run
command I can try to validate that it does what you want?
Here are the commands that fail with Docker Desktop 4.26.1 but succeed on our main branch, with Qemu 8.0.4:
cd /tmp
git clone https://github.com/tensorflow/serving
docker run -t --rm -it --init -p 8501:8501 --platform linux/amd64 -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1 -v "./serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu:/models/half_plus_two" -e MODEL_NAME=half_plus_two tensorflow/serving:2.14.1
Although, from the output of the program, I'm not sure it does actually use AVX instructions:
2024-01-04 12:11:33.524895: I tensorflow_serving/model_servers/server.cc:74] Building single TensorFlow model file config: model_name: half_plus_two model_base_path: /models/half_plus_two
2024-01-04 12:11:33.542335: I tensorflow_serving/model_servers/server_core.cc:467] Adding/updating models.
2024-01-04 12:11:33.544671: I tensorflow_serving/model_servers/server_core.cc:596] (Re-)adding model: half_plus_two
2024-01-04 12:11:33.926545: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: half_plus_two version: 123}
2024-01-04 12:11:33.926776: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: half_plus_two version: 123}
2024-01-04 12:11:33.927587: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: half_plus_two version: 123}
2024-01-04 12:11:33.929071: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /models/half_plus_two/00000123
2024-01-04 12:11:33.936173: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-01-04 12:11:33.936641: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /models/half_plus_two/00000123
2024-01-04 12:11:33.939478: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-04 12:11:34.040351: I external/org_tensorflow/tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-01-04 12:11:34.056903: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-01-04 12:11:34.067407: W external/org_tensorflow/tensorflow/tsl/platform/profile_utils/cpu_utils.cc:118] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2024-01-04 12:11:34.279405: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /models/half_plus_two/00000123
2024-01-04 12:11:34.299615: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 370421 microseconds.
2024-01-04 12:11:34.301360: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:80] No warmup data file found at /models/half_plus_two/00000123/assets.extra/tf_serving_warmup_requests
2024-01-04 12:11:34.547284: I tensorflow_serving/core/loader_harness.cc:95] Successfully loaded servable version {name: half_plus_two version: 123}
2024-01-04 12:11:34.554644: I tensorflow_serving/model_servers/server_core.cc:488] Finished adding/updating models
2024-01-04 12:11:34.556354: I tensorflow_serving/model_servers/server.cc:118] Using InsecureServerCredentials
2024-01-04 12:11:34.556922: I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled
2024-01-04 12:11:34.573646: I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
2024-01-04 12:11:34.585490: I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:8501 ...
-e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1
is a useful (experimental) trick to force the usage of QEMU just for one docker run
command, even though Docker Desktop is configured to use Rosetta, for faster overall emulation.
@dgageot that's so great to hear!
Exactly, running your command on Docker Desktop v4.26.1 for mac, on apple silicon, without -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1
, abruptly ends with the error:
/usr/bin/tf_serving_entrypoint.sh: line 3: 12 Illegal instruction tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"
Similarly, running the command with -e EXPERIMENTAL_DOCKER_DESKTOP_FORCE_QEMU=1
, ends with the error:
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:560] Invalid file descriptor data passed to EncodedDescriptorDatabase::Add().
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
qemu: uncaught target signal 6 (Aborted) - core dumped
/usr/bin/tf_serving_entrypoint.sh: line 3: 12 Aborted tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"
Hello, I've tried the recently released Docker for Mac 4.27.0 and AVX seems to work now on ARM64 🎊
Woot! Thanks @matemijolovic, that's really good news! Is it fully working? How's the perf?
Didn't have time to benchmark the performance, but seems okay at a first glance (I'd say roughly ~2x slower than running on comparable Linux x64 machine). For our usecase this is perfectly acceptable, as we don't run any production inference on ARMs. [EDIT: to clarify, regarding performance, I'm not sure that AVX is actually being used in its full potential, but for us the important thing is that the containers don't crash]
Probably it would also help to compile linux/arm64
TF Serving images, currently there are only linux/amd64
ones so it's unfair to do benchmarks :)
The only issue I observed is that SIGINT
isn't propagated correctly (can't stop a container with CTRL+C
), but can't say for sure if it's related to the particular upgrade. [EDIT, as dgageot suggested, docker run --init
flag helps with this]
Didn't have time to benchmark the performance, but seems okay at a first glance (I'd say roughly ~2x slower than running on comparable Linux x64 machine). For our usecase this is perfectly acceptable, as we don't run any production inference on ARMs.
Good to hear!
The only issue I observed is that
SIGINT
isn't propagated correctly (can't stop a container withCTRL+C
), but can't say for sure if it's related to the particular upgrade.
Have you tried running the container with docker run --init
?
(I'm, closing this issue. Feel free to ping me if you think it needs to be re-opened)
Have you tried running the container with
docker run --init
?
Can confirm this helps, thank you!
Hi everyone! There's a good chance that we rollback the qemu upgrade in Docker Desktop 4.28.0. It has too many regressions for the majority of users. A temporary solution will be for you to stick with 4.27.X.
That's unfortunate but thank you so much @dgageot for the heads up!
We're the regressions reported on Gitlab? Also what patch release are you on?
Why wasn't this issue re-open if the change was undone?
Expected behavior
AMD64 images that use AVX instructions are able to run on ARM64 hosts.
Actual behavior
https://github.com/docker/for-mac/issues/5148
Information
AVX support was recently added to QEMU. I believe Docker needs to update its QEMU version to pull in this functionality?