crc-org / vfkit

Apache License 2.0
123 stars 24 forks source link

Add rosetta support #57

Closed cfergeau closed 1 year ago

cfergeau commented 1 year ago

This adds support for --device rosetta,mountTag=something on the commandline. This is only available on system with Apple CPUs, vfkit will error out if this option is used on Intel CPUs.

Once the VM is running and the rosetta share is mounted, rosetta support can be enabled by creating this file:

$ cat /etc/binfmt.d/rosetta.conf
:rosetta:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x3e\x00:\xff\xff\xff\xff\xff\xfe\xfe\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/mnt/rosetta:F

and then running systemctl restart systemd-binfmt

See these links for more details: https://developer.apple.com/documentation/virtualization/running_intel_binaries_in_linux_vms_with_rosetta?language=objc https://docs.kernel.org/admin-guide/binfmt-misc.html https://www.man7.org/linux/man-pages/man5/binfmt.d.5.html

This fixes https://github.com/crc-org/vfkit/issues/23

praveenkumar commented 1 year ago

To test this I did following and it is failing when restart systemd-binfmt service.

diff --git a/go.mod b/go.mod
index 6fbb482a..32ff9879 100644
--- a/go.mod
+++ b/go.mod
@@ -210,3 +210,5 @@ require (
 )

 replace github.com/apcera/gssapi => github.com/openshift/gssapi v0.0.0-20161010215902-5fb4217df13b
+
+replace github.com/crc-org/vfkit => /Users/prkumar/work/github/vfkit
diff --git a/go.sum b/go.sum
index 44d1e108..63f3b69a 100644
--- a/go.sum
+++ b/go.sum
@@ -134,8 +134,6 @@ github.com/crc-org/admin-helper v0.0.12-0.20221012143549-fd5acd1c478e h1:T6m4n9Z
 github.com/crc-org/admin-helper v0.0.12-0.20221012143549-fd5acd1c478e/go.mod h1:CerKYGP0C/zPeDd6T/k8H7TmyKKBWhfhAzAupcSKPMU=
 github.com/crc-org/machine v0.0.0-20221028075518-f9b43442196b h1:VPbW5D21B1WToPvEA/EGwhi4e3lXevmRff9M1lUTc5g=
 github.com/crc-org/machine v0.0.0-20221028075518-f9b43442196b/go.mod h1:9bEsvgLE3LIPfvGATt9Mo73gG1CKKS6A/++VqOONKqc=
-github.com/crc-org/vfkit v0.1.1 h1:F0QXj9ik1mhVq2R8FmWFhQH8SuFGYP5Xu2KF7cTvALs=
-github.com/crc-org/vfkit v0.1.1/go.mod h1:vjZiHDacUi0iLosvwyLvqJvJXQhByzlLQbMkdIfCQWk=
 github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
 github.com/creack/pty v1.1.11/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
 github.com/creack/pty v1.1.17 h1:QeVUsEDNrLBW4tMgZHvxy18sKtr6VI492kBhUfhDJNI=
diff --git a/pkg/drivers/vfkit/driver_darwin.go b/pkg/drivers/vfkit/driver_darwin.go
index 41c5e38a..63388b0b 100644
--- a/pkg/drivers/vfkit/driver_darwin.go
+++ b/pkg/drivers/vfkit/driver_darwin.go
@@ -235,6 +235,16 @@ func (d *Driver) Start() error {
                return err
        }

+       // Rosetta support
+       dev, err = config.RosettaShareNew("vz-rosetta")
+       if err != nil {
+               return err
+       }
+       err = vm.AddDevice(dev)
+       if err != nil {
+               return err
+       }
+

And then using it microshift preset it fails

✗ ps aux | grep vfkit
prkumar          25636   0.0  0.0 409260528  19344 s000  S    11:16AM   0:00.03 /Users/prkumar/.crc/bin/vfkit --cpus 2 --memory 4096 --kernel /Users/prkumar/.crc/cache/crc_microshift_vfkit_4.13.6_arm64/vmlinuz-5.14.0-70.30.1.el9_0.aarch64 --initrd /Users/prkumar/.crc/cache/crc_microshift_vfkit_4.13.6_arm64/initramfs-5.14.0-70.30.1.el9_0.aarch64.img --kernel-cmdline console=hvc0 BOOT_IMAGE=(hd0,gpt2)/ostree/rhel-5272275ce9aecac61cca772353bf719bc24e7cbe320fc31ce99169ca9cc3bb90/vmlinuz-5.14.0-70.30.1.el9_0.aarch64 crashkernel=1G-4G:256M,4G-64G:320M,64G-:576M rd.lvm.lv=rhel/root root=/dev/mapper/rhel-root ostree=/ostree/boot.0/rhel/5272275ce9aecac61cca772353bf719bc24e7cbe320fc31ce99169ca9cc3bb90/0 rw --device virtio-serial,logFilePath=/Users/prkumar/.crc/machines/crc/vfkit.log --device virtio-fs,sharedDir=/Users/prkumar,mountTag=dir0 --device virtio-rng --device rosetta,mountTag=vz-rosetta --device virtio-blk,path=/Users/prkumar/.crc/machines/crc/crc.img --device virtio-vsock,port=1024,socketURL=/Users/prkumar/.crc/tap.sock,listen --timesync vsockPort=1234

$ sudo mount -t virtiofs vz-rosetta /mnt
$ ls -l /mnt/
total 260
-rwxr-xr-x. 1 core core 545488 May 17 19:38 rosetta

$ cat /etc/binfmt.d/rosetta.conf
:rosetta:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x3e\x00:\xff\xff\xff\xff\xff\xfe\xfe\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/mnt/rosetta:F

$ sudo systemctl status systemd-binfmt.service 
× systemd-binfmt.service - Set Up Additional Binary Formats
     Loaded: loaded (/usr/lib/systemd/system/systemd-binfmt.service; static)
     Active: failed (Result: exit-code) since Tue 2023-08-22 01:50:12 EDT; 18s ago
   Duration: 3min 39.389s
       Docs: man:systemd-binfmt.service(8)
             man:binfmt.d(5)
             https://docs.kernel.org/admin-guide/binfmt-misc.html
             https://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
    Process: 3663 ExecStart=/usr/lib/systemd/systemd-binfmt (code=exited, status=1/FAILURE)
   Main PID: 3663 (code=exited, status=1/FAILURE)
        CPU: 3ms

Aug 22 01:50:12 api.crc.testing systemd[1]: Starting Set Up Additional Binary Formats...
Aug 22 01:50:12 api.crc.testing systemd-binfmt[3663]: /etc/binfmt.d/rosetta.conf:1: Failed to add binary format 'rosetta': Permission denied
Aug 22 01:50:12 api.crc.testing systemd[1]: systemd-binfmt.service: Main process exited, code=exited, status=1/FAILURE
Aug 22 01:50:12 api.crc.testing systemd[1]: systemd-binfmt.service: Failed with result 'exit-code'.
Aug 22 01:50:12 api.crc.testing systemd[1]: Failed to start Set Up Additional Binary Formats.
cfergeau commented 1 year ago
Aug 22 01:50:12 api.crc.testing systemd-binfmt[3663]: /etc/binfmt.d/rosetta.conf:1: Failed to add binary format 'rosetta': Permission denied

There are some permission issues, maybe related to selinux. I'd recommend also adding empty /etc/binfmt.d/qemu-*-static.conf files to make sure rosetta is used and not qemu-user-static. The empty files must have the same names as the ones in /usr/lib/binfmt.d

praveenkumar commented 1 year ago
Aug 22 01:50:12 api.crc.testing systemd-binfmt[3663]: /etc/binfmt.d/rosetta.conf:1: Failed to add binary format 'rosetta': Permission denied

There are some permission issues, maybe related to selinux.

Tried with setenforce 0 and then again restarting the service works so it is related to selinux we might need to add respective labels to mounted binary.

To test out end to end for microshift bundle I need to remove the qemu-x86_64-static.conf since it is already part of qemu-user-static-x86 package, following what I tried and works.

<== initially it is working with qemu-user-static-x86 ==>
$ podman run --rm quay.io/podman/hello@sha256:3381e3a704e8067b5925522d420483e2ce7a966bc1ef7b3008b90539560fc8f4
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
!... Hello Podman World ...!

<== disable qemu-user-static-x86 by adding a empty file with same name ==>
$ sudo touch /etc/binfmt.d/qemu-x86_64-static.conf
$ sudo systemctl restart systemd-binfmt.service 
$ podman run --rm quay.io/podman/hello@sha256:3381e3a704e8067b5925522d420483e2ce7a966bc1ef7b3008b90539560fc8f4
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
exec /usr/local/bin/podman_hello_world: exec format error

<== enable rosetta as per guideline in this PR ==>

$ sudo mount -t virtiofs vz-rosetta /mnt
$ cat /etc/binfmt.d/rosetta.conf
:rosetta:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x3e\x00:\xff\xff\xff\xff\xff\xfe\xfe\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/mnt/rosetta:F
$ sudo systemctl restart systemd-binfmt.service 
$ podman run --rm quay.io/podman/hello@sha256:3381e3a704e8067b5925522d420483e2ce7a966bc1ef7b3008b90539560fc8f4
WARNING: image platform (linux/amd64) does not match the expected platform (linux/arm64)
!... Hello Podman World ...!

I'd recommend also adding empty /etc/binfmt.d/qemu-*-static.conf files to make sure rosetta is used and not qemu-user-static. The empty files must have the same names as the ones in /usr/lib/binfmt.d

cfergeau commented 1 year ago

Tried with setenforce 0 and then again restarting the service works so it is related to selinux we might need to add respective labels to mounted binary.

restorecon /etc/binfmt.d/* or such should be enough

To test out end to end for microshift bundle I need to remove the qemu-x86_64-static.conf since it is already part of qemu-user-static-x86 package

You can also add empty files of the same names in /etc/binfmt.d to disable the ones in /usr/lib/binfmt.d

praveenkumar commented 1 year ago

Tried with setenforce 0 and then again restarting the service works so it is related to selinux we might need to add respective labels to mounted binary.

restorecon /etc/binfmt.d/* or such should be enough

That I tried and not working because it is with mounted binary and not the /etc/binfmt.d/* file, checked the audit logs and it has denied to execute. Even sudo restorecon -R /mnt/* not working.

$ sudo ausearch -m avc --start recent
----
time->Tue Aug 22 04:11:49 2023
type=PROCTITLE msg=audit(1692691909.469:619): proctitle="/usr/lib/systemd/systemd-binfmt"
type=SYSCALL msg=audit(1692691909.469:619): arch=c00000b7 syscall=64 success=no exit=-13 a0=5 a1=ffffe87c9e20 a2=b4 a3=ffff83921020 items=0 ppid=1 pid=6943 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="systemd-binfmt" exe="/usr/lib/systemd/systemd-binfmt" subj=system_u:system_r:init_t:s0 key=(null)
type=AVC msg=audit(1692691909.469:619): avc:  denied  { execute } for  pid=6943 comm="systemd-binfmt" name="rosetta" dev="virtiofs" ino=2 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file permissive=0
----

# audit2allow -i /var/log/audit/audit.log 

#============= init_t ==============
allow init_t unlabeled_t:file execute;

#============= openvswitch_load_module_t ==============
allow openvswitch_load_module_t tracefs_t:dir search;

To test out end to end for microshift bundle I need to remove the qemu-x86_64-static.conf since it is already part of qemu-user-static-x86 package

You can also add empty files of the same names in /etc/binfmt.d to disable the ones in /usr/lib/binfmt.d

praveenkumar commented 1 year ago

Anyway all this selinux issue shouldn't stop this PR to get merge, PR is working as expected.

/lgtm /approve

openshift-ci[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: praveenkumar Once this PR has been reviewed and has the lgtm label, please ask for approval from cfergeau. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/crc-org/vfkit/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
cfergeau commented 1 year ago

That I tried and not working because it is with mounted binary and not the /etc/binfmt.d/* file, checked the audit logs and it has denied to execute. Even sudo restorecon -R /mnt/* not working.

maybe mount -o context=....