Code-Hex / vz

Create virtual machines and run Linux-based operating systems in Go using Apple Virtualization.framework.
https://pkg.go.dev/github.com/Code-Hex/vz/v3
MIT License
588 stars 48 forks source link

Want knowledge document for virtio #61

Closed Code-Hex closed 2 years ago

Code-Hex commented 2 years ago

I want a document for virtio to use vz. I need this the most and it would be very helpful if someone could contribute to this!

For x86_64 and aarch64.

Example

cfergeau commented 2 years ago

virtio-fs

Want to use directory sharing, we have to use mount -t virtiofs command in guest Linux

mount -t virtiofs $tag /mnt/something, $tag is the argument passed to NewVirtioFileSystemDeviceConfiguration When using virtiofs, only one user ID is used when files are created/... It's not possible to use chown (or it won't do anything). There are also differences of behaviour between xattr semantics in linux and macos (for example security xattrs can be set on read-only files in linux but not in macOS - could be filesystem specific)

virtio-console

Serial console needs console=hvc0 kernel parameter and guest Linux is required to enable CONFIG_VIRTIO_CONSOLE and CONFIG_HVC_DRIVER

As I understand it, the console goes through virtio, and it is not possible to capture early boot messages with it. It starts working around the time /dev is populated. I think https://github.com/evansm7/vftool#kernelsnotes is related:

Note that Virtualization.framework provides all IO as virtio-pci, including the console (i.e. not a UART).

If there's a way to capture early kernel messages, it would be useful to document it.

virtio-blk

The virtio-blk implementation only supports raw disk images, but macOS filesystem supports sparse files well enough.

virtio-vsock

In my opinion, the most problematic virtio device in virtualization framework is virtio-vsock, because it does not work out of the box. I think this is because macOS hosts don't support AF_VSOCK natively. The virtualization framework only provides ways to get a file descriptor for the vsock communication. Then the application using virtualization framework or Code-Hex/vz needs to do 'something' with it. For example it can be exposed as a unix socket on the host using inet.af/tcpproxy. One caveat of this approach is that you need to decide beforehand if you want a socket to initiate connections from host to VM, or from VM to host. I'm not sure it's possible to combine both, as is possible with AF_VSOCK sockets on linux

Code-Hex commented 2 years ago

@cfergeau I agree with this. But this is required private APIs _VZ16550SerialPortConfiguration (for x86_64) and _VZPL011SerialPortConfiguration to output early kernel messages. But I don't want to support them actually...

If there's a way to capture early kernel messages, it would be useful to document it.

Maybe both are possible, but I can't really think of use cases. I have written a library (https://github.com/Code-Hex/darwin-vsock) to connect from host to guest in the past, but these did not work well. (I got a message Operation not supported by device when I tried it).

One caveat of this approach is that you need to decide beforehand if you want a socket to initiate connections from host to VM, or from VM to host. I'm not sure it's possible to combine both

Code-Hex commented 2 years ago

How about virtio-net?

cfergeau commented 2 years ago

Maybe both are possible, but I can't really think of use cases.

My usecase is debugging https://github.com/crc-org/vfkit/issues/11 :) I suspect something is printed in the kernel error log, but it is not easily accessible with Code-Hex/vz at the moment. Not suggesting to support the private API, just showing one usecase for this.

cfergeau commented 2 years ago

How about virtio-net?

With NAT, the IP address of the VM can be found in /var/db/dhcpd_leases. Bridge networking is unsupported at the moment (but see https://github.com/kata-container/vz/commit/215e9eeb436b5855e50a8eaa2cf322a652915566 )

Code-Hex commented 2 years ago

I think Bridge networking is required com.apple.vm.networking But I don't know how to get permission. See: https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_vm_networking?language=objc

balajiv113 commented 2 years ago

@Code-Hex You can run the program with sudo to overcome this entitlement requirement.

Code-Hex commented 2 years ago

@balajiv113 Really!? I didn't know that. I try it. Thank you very much!

Code-Hex commented 2 years ago

This is a document about directory sharing. https://developer.apple.com/documentation/virtualization/vzvirtiofilesystemdeviceconfiguration?language=objc

It's written important things

The commands required to mount shared directories in a guest VM aren’t commands that your app can execute or that you can script from inside your application to a VM; the user must perform them either interactively or as part of a script while logged in to the guest. You must communicate these requirements to the user of your app.

I tried directory sharing with guest macOS 12 but I can't do it. It seems macOS will support it from v13. In addition, macOS 13 supports auto-mounting shared directories in macOS VMs 🙌

cfergeau commented 2 years ago

This sounds like macOS-only limitation, I did not have this problem with linux guests

balajiv113 commented 2 years ago

Yes yes, macOS directory sharing is supported only from macOS 13.

Linux directory sharing was there from macOS 12.

https://developer.apple.com/documentation/virtualization/shared_directories?changes=_2 (check the note section in this)

cfergeau commented 2 years ago

When using virtio-vsock, you must add exactly one VirtioSocketDeviceConfiguration to the VM config. Then for each vsock port that you need to use, you call ConnectToPort or SetSocketListenerForPort

Code-Hex commented 2 years ago

I've been researching what VZVirtioSocketConnection is by using Xcode and I found out it is unix socket connection.


  1. It is certain that vsock is used to connect from the virtualization framework to the guest VM, but I investigated how to get the context ID of the guest at this time.
  2. VZVirtioSocketDevice holds the guest context ID as a private member. This value can be retrieved via the debugger. (see wiki).
  3. Attempted to connect from the host to the guest using this context ID, with unsuccessful results. (like socat - vsock-connect:3:2222 in this case, the guest cid is 3)
  4. Therefore, it can be seen that vsock connections cannot be made w/o going through the virtualization framework.

Next, I write code like this to investigate what this connection is using the fileDescriptor member you get from VZVirtioSocketConnection.

struct sockaddr_in sin;
memset(&sin, 0, sizeof(sin));

socklen_t len = sizeof(sin);
int r = getsockname(connection.fileDescriptor, (struct sockaddr *)&sin, &len);
NSLog(@"vsock connection: %d, r: %d, family: %d, AF_UNIX?: %d",
    connection.fileDescriptor,
    r,
    sin.sin_family,
    sin.sin_family == AF_UNIX);

The results were as follows. This result shows that it is a unix socket.

vsock connection: 3, r: 0, port: 0, addr: 0, family: 1, AF_UNIX?: 1

Then I tried the following code to see where the socket file was located. However, no results were output, so I assumed that it must have been created as an unnamed pipe.

char pathbuf[PATH_MAX];
if (fcntl(connection.fileDescriptor, F_GETPATH, pathbuf) >= 0) {
    NSLog(@"======== path: %s", pathbuf);
}

I checked by tracing system calls to confirm whether my thought is correct or not. And I found this call.

socketpair(AF_UNIX, SOCK_STREAM, 0x0, 0x16afdd528)

Here I found out that I was right.

I concluded that this is a connection between the vsock held by the VMM and unix socket that the host touches.

macOS host <-- unix socket --> VMM (Virtualization framework) <-- vsock --> guest OS

Code-Hex commented 2 years ago

As commented, the user can only handle unix sockets, so this code may need to be modified to export the VZVirtioSocketConnection created by the net.FileConn.

https://github.com/Code-Hex/vz/blob/d856144e4e723951968bbb79212bf823e6c24f0f/socket.go#L202-L212

cfergeau commented 2 years ago

3. Attempted to connect from the host to the guest using this context ID, with unsuccessful results. (like socat - vsock-connect:3:2222 in this case, the guest cid is 3)

man vsock 4 mentions this won't work for host->guest connections. Took me a while to find it when I looked at this some time ago!

Currently, only stream connections from a guest are supported using this [vsock] protocol.

Code-Hex commented 2 years ago

Yeah, I think this is difficult to look for.

cfergeau commented 2 years ago

https://github.com/cfergeau/vfkit/blob/a6ea35eabf11ecf93a10dae4957ae0bfcdaeb5b2/pkg/vf/vsock.go has code making use of inet.af/tcpproxy to enable host unix socket <-> guest vsock communications

Code-Hex commented 2 years ago

For now, it seems that the virtualization framework needs to make it clear that it is using unix sockets (vsock over unix socket).

cfergeau commented 2 years ago

For now, it seems that the virtualization framework needs to make it clear that it is using unix sockets (vsock over unix socket).

They do not mention it in their documentation. Thus I would consider their use of unix socket to be an internal implementation detail.

Code-Hex commented 2 years ago

Yes, but virtualization framework users cannot touch the vsock connection directly. So It was not a good to makeVirtioSocketConnection.LocalAddr() and VirtioSocketConnection.RemoteAddr() look like vsock.

They do not mention it in their documentation. Thus I would consider their use of unix socket to be an internal implementation detail.

https://github.com/Code-Hex/vz/issues/61#issuecomment-1290159568

cfergeau commented 2 years ago

It's true that the file descriptor used by vz.VirtioSocketConnection is not an AF_VSOCK socket. But noone ever said it was. Yes, this can be confusing, but this can be documented. vz.VirtioSocketConnection is a net.Conn. Reading/writing to this net.Conn will send data to the VM using vsock. The API listens/connects on a port which corresponds to a vsock port. This port is then returned by RemoteAddr(), which is in my opinion is correct. As a user of Code-Hex/vz, it's not clear to me what https://github.com/Code-Hex/vz/issues/61#issuecomment-1290159568 would allow me to do that I cannot already do with vz.VirtioSocketConnection.

Code-Hex commented 2 years ago

This is incorrect. Quote this comment (2., the guest CID that wants to be the RemoteAddr gets a different value with the current returning RemoteAddr. As commented, this CID can be obtained via a private member. This is a changing value and the current situation is also wrong.

You can also write Objective-C in xcode and check it using the debugger

Reading/writing to this net.Conn will send data to the VM using vsock. The API listens/connects on a port which corresponds to a vsock port. This port is then returned by RemoteAddr(), which is in my opinion is correct.

cfergeau commented 2 years ago

I only talked about ports in my comment, not CIDs. I know the CIDs are a bit artificial in the current implementation. I am not sure they are very useful with virtualization framework. The CID is a way to identify which VM we are connecting to. This is needed when using AF_VSOCK as there is no other way to specify the VM. With the virtualization framework, we cannot initiate a connection without using a VZVirtualMachine object, so we already know which VM to connect to.

Code-Hex commented 2 years ago

Ports can still be obtained using two methods DestinationPort and SourcePort. I just don't see the need to provide false RemoteAddr, and LocalAddr.

cfergeau commented 2 years ago

Code-Hex/vz defines the Addr type. It can be changed to remove the CID from it. RemoteAddr and LocalAddr would only return ports. I prefer to use 'standard' go methods, than having to know RemoteAddr/LocalAddr are no good for vz connections, and that I need to use DestinationPort and SourcePort instead. Seeing https://pkg.go.dev/net#UDPAddr.AddrPort , adding vz.Addr.AddrPort() might be better than keeping DestinationPort/SourcePort.

Code-Hex commented 2 years ago

I don't know what you are trying to solve, but I am going to return LocalAddr, RemoteAddr in unnamed unix socket format, as I have mentioned from the beginning. I still offer two methods. DestinationPort and SourcePort via VirtioSocketConnection struct.

cfergeau commented 2 years ago

I'm saying the abstraction should be kept. Users of vz don't care how the virtualization framework/vz/go implemented the virtio-vsock communication. What they care about is that they have a vsock connection to the VM over port XX. If you want to provide a way to get the underlying connection with the unnamed unix socket information, why not, but I'm not sure it is needed. You also have no guarantee that this implementation detail won't silently changed to something totally different in future update.

Note that the properties Apple exposed on VZVirtioSocketConnection are the vsock ports, not some unnamed unix socket details.

Code-Hex commented 2 years ago

This is understandable, but the current implementation is already different. I've said this many times.

Apple (VZVirtioSocketConnection) provides three members

  1. fileDescripter
  2. destinationPort
  3. sourcePort

Of these, vsock information is provided in the latter two. However, as far as the fileDescripter is concerned, it is from an unnamed unix socket, as users who have touched the virtualization framework API will know. vz.VirtioSocketConnection returns a net.Conn created on that basis instead of providing a fileDescriptor. This means that this should be unix socket information. (More precisely, it should return exactly the information available in net.FileConn). This is because in this way there will be no difference between the information provided by the virtualization framework VZVirtioSocketConnection.

I'm saying the abstraction should be kept

Yes. So there is no need to get it via LocalAddr, RemoteAddr.

Note that the properties Apple exposed on VZVirtioSocketConnection are the vsock ports, not some unnamed unix socket details.

cfergeau commented 2 years ago

However, as far as the fileDescripter is concerned, it is from an unnamed unix socket, as users who have touched the virtualization framework API will know.

This is the part I disagree with. Where is this documented in Apple's documentation? As far as I'm concerned, this is an open file descriptor which can be used with the read/write system calls. Apple does not document it as "an unnamed unix socket", this is only "The file descriptor to use when sending data." ( https://developer.apple.com/documentation/virtualization/vzvirtiosocketconnection/3656674-filedescriptor?language=objc ).

As you said, the properties VZVirtioSocketConnection has are 2 vsock ports and a file descriptor. golang's net.Conn can be used as a high-level wrapper for these 3 things, and this is how it is implemented now in Code-Hex/vz. The low level details (unnamed unix socket) are just low level non documented/internal low level details. I don't think these low level details should replace the current implementation.

Code-Hex commented 2 years ago

Yes. It is undocumented about an unnamed unix socket. However, in Objective-C (Go is OK too), if you only return a fileDescripter, how would you make read/write to a bi-directional connection from this member? It would be inevitable to use a wrapper that would generate a file handler from this member. At this time, some libraries call the getsockopt system call for the file descriptor. At this point, you can already see that this file descriptor is an unix socket. Although undocumented, it is inevitable that the same code would be written even if Apple's internal implementation had changed.

I'll tell you something you probably don't know: Go's net.FileConn does the same thing.

https://cs.opensource.google/go/go/+/refs/tags/go1.19.2:src/net/file_unix.go;l=36;drc=bd6cd7db07f314443acdb980393f57386d40551f

In other words, the rawConn of the current vz.VirtioSocketConnection structure is a unix socket, not a vsock. the program represents the facts.

Code-Hex commented 2 years ago

I wrote https://github.com/Code-Hex/vz/wiki