Enabling AVX instructions on x86_64 arch VM with colima on M1 Mac

colindean commented 2 years ago

Describe the Issue

I'd like to enable AVX instructions so that I can use Tensorflow packages direct from Python's PyPi inside the colima VM. Without it, I get this error message and an exception that seems cannot be safely caught:

The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.

I'm desperately trying to avoid having to recompile the Tensorflow package without AVX instructions enabled. This will add package management complexity I want to avoid at all costs.

N.b. I'm not trying to actually do high-intensity ML work in this scenario: this is just unit tests and local development of an inference server within a Docker container. The actual deployment environment has Intel Xeon processors under the hood. Tensorflow seems not to even want to start up without AVX instructions available.

Version

Colima Version:

colima version 0.4.2
git commit: f112f336d05926d62eb6134ee3d00f206560493b

runtime: docker
arch: x86_64
client: v20.10.14
server: v20.10.11

Lima Version:

limactl version 0.11.0

Qemu Version

qemu-img version 7.0.0
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

Operating System

[x] ~~macOS Intel~~
[X] macOS m1
[x] ~~Linux~~

$ sw_vers 
ProductName:    macOS
ProductVersion: 12.3.1
BuildVersion:   21E258

To Reproduce

Steps to reproduce the behavior:

colima start --arch x86_64 --cpu 2 --memory 4 --cpu-type max,+avx,+avx2 --disk 60
colima ssh
grep avx /proc/cpuinfo
See no output

Expected behavior

I'd expect avx to be in the output of /proc/cpuinfo.

Additional context

I'm 99% sure that AVX is implemented in QEMU but I speculate based on the present behavior that it may not be enableable on an ARM64 Mac.

https://wiki.qemu.org/Internships/ProjectIdeas/AVX - 2019 internship project to implement AVX in qemu https://github.com/andikleen/qemu-avx - dead project that’s 9 years old but was successful, so I’ll bet it was upstreamed

However, qemu lists it:

$ qemu-system-x86_64 -cpu help | grep avx
  avic avx avx-vnni avx2 avx512-4fmaps avx512-4vnniw avx512-bf16
  avx512-fp16 avx512-vp2intersect avx512-vpopcntdq avx512bitalg avx512bw
  avx512cd avx512dq avx512er avx512f avx512ifma avx512pf avx512vbmi
  avx512vbmi2 avx512vl avx512vnni bmi1 bmi2 bus-lock-detect cid cldemote

I however found a user comment that asserts:

the fact that qemu lists a bunch of "Available CPUs" that require AVX2 qemu-cpu.txt definitely looks like a bug.

That is, qemu-system-x86_64 -cpu help says that avx is available when it's not available currently on aarch64 host and x86_64 guest.

I've tried the --cpu-type option as:

max
Skylake-Client
max,+avx,+avx2

The running qemu command as of the last option:

qemu-system-x86_64 -m 4096 -cpu max,+avx,+avx2 -machine q35,accel=tcg -smp 2,sockets=1,cores=2,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Users/colin/.colima/_wrapper/share/qemu/edk2-x86_64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/colin/.lima/colima/basedisk,media=cdrom,readonly=on -drive file=/Users/colin/.lima/colima/diffdisk,if=virtio -cdrom /Users/colin/.lima/colima/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:60775-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:bd:45:24 -device virtio-rng-pci -display none -device virtio-vga -device virtio-keyboard-pci -device virtio-mouse-pci -parallel none -chardev socket,id=char-serial,path=/Users/colin/.lima/colima/serial.sock,server=on,wait=off,logfile=/Users/colin/.lima/colima/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/colin/.lima/colima/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-colima -pidfile /Users/colin/.lima/colima/qemu.pid -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan,mac=5a:94:ef:a7:40:5a

I note the presence of -cpu max,+avx,+avx2 as qemu seems to expect.

The ultimate question is this: **How can I enable AVX instructions for a Colima-managed VM?"

abiosoft commented 2 years ago

Maybe it is a limitation on M1 devices, I am able to see avx and avx2 on my Intel after specifying cpu as host,+avx,+avx2.

abiosoft commented 2 years ago

It is definitely not available for M1 devices as qemu-system-aarch64 -cpu help | grep avx returns no output.

I also saw avx enabled with kvm64,+avx,+avx2, however anything other than qemu64 cpu type is very slow on M1 devices.

The best performance would've been qemu64,+avx,+avx2 but it is not supported for avx from my test. You can try with kvm64 cpu type but I doubt the speed would be bearable.

colindean commented 2 years ago

Thank you for the incredibly quick response!

I'll try kvm64 and see if something is worse than nothing ;-)

Edit: kvm64 didn't work, reasoning below.

$ colima start --arch x86_64 --cpu 2 --memory 4 --cpu-type kvm64,+avx,+avx2 --disk 60
…                                    
$ colima ssh
colima:~$ grep avx /proc/cpuinfo 
colima:~$

I'm learning up on the scene a bit and finding some references to HVF. Apparently, there are some builds of qemu that might enable qemu to use macOS Hypervisor.framework and expose AVX through that if and only if it's actually been implemented.

colindean commented 2 years ago

HVF may already be there and only relevant to aarch64:

$ qemu-system-aarch64 -accel help
Accelerators supported in QEMU binary:
hvf
tcg
$ qemu-system-x86_64 -accel help
Accelerators supported in QEMU binary:
tcg

Yeah, confirmed. That qemu-hvf fork was upstreamed as of qemu v6.2.0.

colindean commented 2 years ago

qemu upstream issue: qemu x86 TCG doesn't support AVX insns

Ticket Close gap for x86_64-v3 ABI in TCG - CPU support for fma, f16c, avx, avx2 features required pointed me to this mailing list patchset that would implement AVX, it seems. It's not merged yet. The author is hosting their work at https://github.com/pbrook/qemu/tree/avx.

colindean commented 2 years ago

TL;DR As of this comment's timestamp, qemu doesn’t support AVX for aarch64 hosts and x86_64 guests yet, but there’s a patchset that may enable it in development.

If you're finding this issue looking for running Tensorflow inside of a colima/lima/qemu container on an M1 Mac, the short is "you can't" and I'm working on a workaround of some kind.

colindean commented 2 years ago

Some other additional context, mostly geared toward my particular predicament with trying to have Tensorflow start inside of a colima x86_64 container:

Tensorflow shipping AVX-enabled by default https://github.com/tensorflow/tensorflow/issues/19584
Tensorflow may be silence-able with the methods available in silence-tensorflow but in practice this seems not to work anymore…?
This SO Q/A gets into details about AVX and recommends the same approach from the previous bullet

colindean commented 1 year ago

qemu 7.2.0 came out this week and I've started playing with it, starting with getting qemu 7.2.0 into Homebrew. My preliminary results aren't looking great.

I've started qemu via colima with

colima start --arch x86_64 --cpu 2 --cpu-type "qemu64,+sse4.2,+sse4.1,+sse,+sse2,+avx,+avx2" --memory 4

and it starts correctly. However, when I try to run my containers, I'm seeing shell processes exit with 132 or 139 exit codes… indicating an illegal instruction (132 - SIGILL) or segmentation fault (139 - SIGSEGV) when running bash or sh respectively. I haven't yet tried destroying the VM entirely.

ghost commented 1 year ago

@colindean I tried deleting the VM by running colima delete and starting qemu using the same command as you did:

colima start --arch x86_64 --cpu 2 --cpu-type "qemu64,+sse4.2,+sse4.1,+sse,+sse2,+avx,+avx2" --memory 4

And I am getting similar error codes. I have qemu 7.2.0 here as well, with HEAD colima compiled from source using homebrew.

I also tried running with Rosetta 2 and some different parameters:

colima start --cpu 4 --memory 6 --disk 100 --arch amd64 --cpu-type "qemu64,+sse4.2,+sse4.1,+sse,+sse2,+avx,+avx2" --vm-
type=vz --vz-rosetta

But got similar results.

Did you manage to resolve this?

colindean commented 1 year ago

I have not resolved this yet 😞

qiangli commented 1 year ago

Install the latest qemu (8.0.0) https://www.qemu.org/, it should work now. Thanks for the great work!

colindean commented 1 year ago

Yes! Going to give 8.0.0 a shot next week and see if I can get Tensorflow working!

chunleng commented 1 year ago

Tried with the following command with Qemu 8.0.0:

colima start --arch x86_64 --cpu 4 --cpu-type "max" --memory 8
docker run --rm -it tensorflow/tensorflow:1.7.1 bash

The container started up correctly but, when I run import tensorflow on python, it goes onto a long wait and timeout

mtomilov commented 1 year ago

Tried with the following command with Qemu 8.0.0:
colima start --arch x86_64 --cpu 4 --cpu-type "max" --memory 8
docker run --rm -it tensorflow/tensorflow:1.7.1 bash
The container started up correctly but, when I run import tensorflow on python, it goes onto a long wait and timeout

Can confirm Tried different versions of tensorflow and it seems to always hang somewhere here:

python -v
>>> import tensorflow
`import 'numpy.ma' # <_frozen_importlib_external.SourceFileLoader object at 0x7ff84f2f2b20>`

Basically making colima unresponsive Myself ended up using tensorflow-macos, had high hopes for colima and qemu though.

dre-hh commented 1 year ago

colima start --arch x86_64 --cpu 8 --cpu-type "max" --memory 16 and tensorflow 2.13.0 works at first glance. lscpu also shows AVX instructions and tensorflow prints that it will use those in performance criticial operation However running anything proper is terribly slow. Also qemu process shows only 100% usage , hence apparently using only a single core

Also tried colima start --cpu 8 --memory 16 --vm-type vz --vz-rosetta --mount-type virtiofs and starting docker containers with export DOCKER_DEFAULT_PLATFORM="linux/amd64" . docker jumps into emulation then, but lscpu reports that the cpu acthitecture is only 32 bit . Its the same behaviour when enable rosetta on official docker desktop for MacOs

As of now, there isn't really a way to run performant x86_64 docker container on Apple Silicon.

daveguyz commented 3 months ago

I spend my whole day rebuilding my script, running from one machine to another, and i seriously was just supposed to stop freaking out and read. But thanks for the solution, its plain simple and easy "import os nl os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

abiosoft / colima