Open bgilbert opened 5 years ago
:smile:
I figured this one didn't need any explanation.. We don't need anything special for qemu do we ?
QEMU developers would like Ignition to read configs from SMBIOS OEM strings rather than the -fw_cfg
mechanism; see https://github.com/coreos/ignition/issues/656. This seems likely to require changes in upstream QEMU and libvirt as well as in Ignition. The switch from CL to FCOS would be a good time to document and recommend the new mechanism if it's available.
@bgilbert we're being asked to include the qemu-guest-agent.
If we don't include any agents, how do we justify the reduced feature set and degraded experience? It looks like the qemu agent assists w/ guest actions like shutdown/reboot as well as file system quiesce actions.
I'm probably more concerned about this approach for the open-vm-tools agent which has many more capabilities. Thoughts?
If we don't include any agents, how do we justify the reduced feature set and degraded experience?
I'd argue the other way around: if we're going to include an agent, we'd need to be convinced that doing so is a net improvement. Looking at the QGA command schema, I'm seeing:
The resource scaling commands are useful, as well as potentially the suspend support if that's a use case we're interested in. OTOH, the security bypass commands are exactly the sort of functionality that makes agents problematic.
@mrguitar - We don't currently include qemu-guest-agent in fedora Atomic Host. I'm not necessarily opposed to including it in Fedora CoreOS for a good reason. We decided to open these tickets for every platform so we could deliberate, decide, and document the outcomes. Thanks for joining the discussion :)
I'm going to pull that team into this discussion. I think that's probably the best next step. Thanks guys.
any news here ? is there a solution for installing ovirt-guest-agent on fedora core os ?
If qemu guest agent is not included in FCOS, is there a way to install it ? In addition to what has been said before, it is usefull when performing live backups. ie Proxmox backup command issue fs-freeze, fs-thaw etc kind of commands at the beginning of a backup
There seems to be a solution to this but it is behind a Red Hat subscription paywall. Solution Page. It seems like this was determined to not be important to CoreOS in non-Red Hat distributions. I would vouch for this being included either as a container or built in to the qcow image.
EDIT: an alternative is running a container and passing the device through. linuxkit/qemu-ga seems to be updated.
[Unit]
Description=QEMU Guest Agent
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
Restart=always
ExecStartPre=-/usr/bin/docker stop %n
ExecStartPre=-/usr/bin/docker rm %n
ExecStartPre=/usr/bin/docker pull linuxkit/qemu-ga:v0.8
ExecStart=/usr/bin/docker run --rm --device=/dev/virtio-ports/org.qemu.guest_agent.0 --name test linuxkit/qemu-ga:v0.8 /usr/bin/qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0
[Install]
WantedBy=multi-user.target
Here is the "solution" from RH:
Issue qemu-guest-agent is not included in Red Hat Enterprise Linux CoreOS for OpenShift 4 We need to install 'qemu-guest-agent' and on RHCOS nodes
Resolution The qemu-guest-agent is not currently available nor supported on RHEL CoreOS (RHCOS) nodes in OpenShift Container Platform 4.x.
What about installing it through rpm-ostree install qemu-guest-agent
? Seems to work like expected, running on Fedora CoreOS 31.20200407.3.0:
# rpm-ostree install qemu-guest-agent
Checking out tree 89e17cc... done
Enabled rpm-md repositories: updates fedora
Updating metadata for 'updates'... done
rpm-md repo 'updates'; generated: 2020-08-20T00:55:24Z
Updating metadata for 'fedora'... done
rpm-md repo 'fedora'; generated: 2019-10-23T22:52:47Z
Importing rpm-md... done
Resolving dependencies... done
Will download: 2 packages (403.6 kB)
Downloading from 'updates'... done
Downloading from 'fedora'... done
Importing packages... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Added:
pixman-0.38.4-1.fc31.x86_64
qemu-guest-agent-2:4.1.1-1.fc31.x86_64
Run "systemctl reboot" to start a reboot
# systemctl status qemu-guest-agent.service
● qemu-guest-agent.service - QEMU Guest Agent
Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-08-21 11:15:37 UTC; 8min ago
Main PID: 772 (qemu-ga)
Tasks: 1 (limit: 4625)
Memory: 1.8M
CGroup: /system.slice/qemu-guest-agent.service
└─772 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist= -F/etc/qemu-ga/fsfreeze-hook
Aug 21 11:15:37 ********* systemd[1]: Started QEMU Guest Agent.
We are using it on ovirt and the information are properly reported and populated in the ui.
What about installing it through
rpm-ostree install qemu-guest-agent
? Seems to work like expected, running on Fedora CoreOS 31.20200407.3.0:
Yep. You can do it with package layering, but do note #400 - we're working on a solution to make the layering more reliable, but currently you might hit an issue, so keep that in mind.
I'm working on getting qemu-guest-agent running using your the container method suggested by @kschamplin in https://github.com/coreos/fedora-coreos-tracker/issues/74#issuecomment-671064222, but I'm struggling to get the reboot/shutdown agent commands working. I'm using Proxmox 6.2 as my hypervisor, if it matters.
I'm using the following exec statement:
docker run --rm --device=/dev/virtio-ports/org.qemu.guest_agent.0 --net=host --ipc=host --pid=host --name qemu-guest-agent linuxkit/qemu-ga:v0.8 /usr/bin/qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0
When I issue a shutdown or reboot command through the Proxmox GUI or via qm agent <vid> shutdown
(for example), I get the following error:
**
ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
Bail out! ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
If I add --privileged
, I get the following error upon executing the container:
1603744671.54205: critical: error opening channel: No such file or directory
1603744671.54234: critical: error opening channel
1603744671.54241: critical: failed to create guest agent channel
1603744671.54246: critical: failed to initialize guest agent channel
Any suggestions for what I could do to make this work?
I've also tried to get the containerized version running as it seems to be the cleanest approach (as there are no modifications of the base system), however there are multiple problems with this solution at the moment. The errors as reported by @Nick2253 in https://github.com/coreos/fedora-coreos-tracker/issues/74#issuecomment-716823680 are caused by multiple issues:
When I issue a shutdown or reboot command through the Proxmox GUI or via
qm agent <vid> shutdown
(for example), I get the following error:** ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel) Bail out! ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
This seems to be caused by a bug in the guest agent itself - the docker image linuxkit/qemu-ga:v0.8
contains qemu-ga
version 4.2.0 -> https://bugzilla.redhat.com/show_bug.cgi?id=1884531 (bug apparently introduced in 4.0.0 and fixed in 5.1.0). The agent does not crash when using linuxkit/qemu-ga:v0.7
, which contains qemu-ga
version 3.1.0.
If I add
--privileged
, I get the following error upon executing the container:1603744671.54205: critical: error opening channel: No such file or directory 1603744671.54234: critical: error opening channel 1603744671.54241: critical: failed to create guest agent channel 1603744671.54246: critical: failed to initialize guest agent channel
I couldn't figure out the exact root of the problem, but this is related to the access of the virtio device. In Fedora CoreOS, /dev/virtio-ports/org.qemu.guest_agent.0
is actually visible as symlink:
$ ls -l /dev/virtio-ports/org.qemu.guest_agent.0
lrwxrwxrwx. 1 root root 11 23. Nov 12:10 /dev/virtio-ports/org.qemu.guest_agent.0 -> ../vport2p1
When using --privileged
and any other device than the original path (/dev/vport2p1
in my case), the agent failed. Thus the following two variants worked on my system:
podman run --privileged --rm --pid=host --ipc=host --net=host --device=/dev/virtio-ports/org.qemu.guest_agent.0 linuxkit/qemu-ga:v0.7 /usr/bin/qemu-ga -m virtio-serial -p /dev/vport2p1
Note that I'm using podman
, which automatically resolves the symlink and only makes the target available within the container. Manually setting the device path (by appending :/dev/other
to the device parameter) and using that one with the agent however also did not work.
/dev
path available within the container:
podman run --privileged --rm --pid=host --ipc=host --net=host -v /dev:/dev -it linuxkit/qemu-ga:v0.7 /usr/bin/qemu-ga -m virtio-serial
Any suggestions for what I could do to make this work?
None of the listed fixes are sufficient to provide a complete solution, as shutdown is still not possible from within the container - the agent simply does not crash or complain anymore, but it hangs upon requesting a shutdown. The reasons is probably the way the feature is implemented in the agent itself, as it tries to call /sbin/shutdown
(https://github.com/qemu/qemu/blob/v3.1.0/qga/commands-posix.c#L110). This is not available in a standard container. I'd say shutting down the host from within a container is an interesting task in general, although possible in general it usually requires some extra tricks (e.g. by having systemd
available inside the container and mounting appropriate sockets from the host, or via SysRq, see also https://stackoverflow.com/a/24759427 for some hints).
So I suppose this would need addition of several extra scripts or similar modifications to the guest agent container to make it work, deviating significantly from the premise of a simple setup via the container. Hence I don't think the container approach is worth the effort when requiring shutdown capabilities. I'll also try to install it via rpm-ostree
for now, but I believe the best solution would be to get the agent integrated properly into the base image.
Note: if shutdown functionality is not needed, blacklisting the guest-shutdown
command helps to avoid the hangup in case the hypervisor issues the command nevertheless (but keep in mind it will be discarded, may have unintended side effects in the hypervisor logic).
I am now knee deep in the process of trying to get qemu-guest-agent
working through an Alpine base, but I'm running into problems with the shutdown command, and I'm assuming it has something to do with a lack of understanding on how Linux works this magic.
I'm building the container using:
FROM alpine:3.15.2
RUN apk add --update --no-cache qemu-guest-agent
ENTRYPOINT [ "/usr/bin/qemu-ga" ]
CMD ["-m", "virtio-serial", "-p", "/dev/virtio-ports/org.qemu.guest_agent.0"]
I then build and run the container as follows:
sudo docker build -f Dockerfile -t qemu-guest-agent:dev1
sudo docker run --rm --name qemu-ga --privileged -v /dev:/dev --ipc=host --net=host qemu-guest-agent:dev1
This gets me to a point where I have running guest agents, and I'm able to view IP addresses through the Proxmox interface, but as before, shutdown/reboot/etc commands don't work. However, unlike before, I don't get any errors; if I run the Shutdown command, the container kills the guest agents, though it otherwise keeps ticking.
Doing some research, it looks like Alpine's version of qemu-guest-agent
is patched to execute a second "fallback" shutdown command:
if (!has_mode || strcmp(mode, "powerdown") == 0) {
shutdown_flag = "-P";
+ fallback_cmd = "/sbin/poweroff";
} else if (strcmp(mode, "halt") == 0) {
shutdown_flag = "-H";
+ fallback_cmd = "/sbin/halt";
} else if (strcmp(mode, "reboot") == 0) {
shutdown_flag = "-r";
+ fallback_cmd = "/sbin/reboot";
} else {
error_setg(errp,
"mode is invalid (valid values are: halt|powerdown|reboot");
@@ -111,6 +115,7 @@ void qmp_guest_shutdown(bool has_mode, c
execle("/sbin/shutdown", "shutdown", "-h", shutdown_flag, "+0",
"hypervisor initiated shutdown", (char *)NULL, environ);
+ execle(fallback_cmd, fallback_cmd, (char*)NULL, environ);
From poking around in Alpine, these commands are links to busybox
, which must do some of the same kind of magic as systemd
as far as handling symbolic links. However, this is just black magic to me, and I don't fully understand how this works.
Speaking of systemd
, based on some ideas that I've seen elsewhere, I tried to force systemd
into the container, but that didn't work:
Added the following to the Dockerfile:
RUN ln -sf /bin/systemctl /sbin/halt; \
ln -sf /bin/systemctl /sbin/poweroff; \
ln -sf /bin/systemctl /sbin/reboot; \
ln -sf /bin/systemctl /sbin/runlevel; \
ln -sf /bin/systemctl /sbin/shutdown; \
ln -sf /bin/systemctl /sbin/telinit
Ran the new image with the following commands:
sudo docker run --rm --name qemu-ga --privileged -v /dev:/dev -v /bin/systemctl:/bin/systemctl -v /run/systemd/system:/run/systemd/system -v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket -v /sys/fs/cgroup:/sys/fs/cgroup --ipc=host --net=host qemu-guest-agent:dev1
However, the Alpine container seems unable to even run these commands. When I shell into the container and try to directly execute /sbin/shutdown
, I get an error that: sh: /sbin/shutdown not found
. Ditto when I try to run systemd
: sh: /bin/systemctl: not found
. I don't fully understand why I'm getting this error.
My next approach is to replace all the relevant poweroff/reboot/shutdown/etc commands with scripts that make a call into host system through some socket. However, I'm at a loss here on how to do that.
Just to xref in https://bugzilla.redhat.com/show_bug.cgi?id=1900759 we ended up adding this to RHEL CoreOS and I think didn't try to go through the outstanding concerns here (partly because it originated as a PR to the MCO?), so it's another confusing difference between the two today.
Hi, I see the qemu-guest-agent is prenset here but is not present at https://quay.io/repository/fedora/fedora-coreos-kubevirt
Do you know why is that ?
Hi, I see the qemu-guest-agent is prenset here but is not present at https://quay.io/repository/fedora/fedora-coreos-kubevirt
For the record (since it was being discussed in a second issue) this was established to not be accurate in https://github.com/coreos/fedora-coreos-tracker/issues/1126#issuecomment-1536209380.
In #12 we decided that we'd like to try to not ship cloud agents. This ticket will document investigation and strategy for shipping without a cloud agent on the qemu virtualization platform.
See also #41 for a discussion of how to ship cloud specific bits using ignition.