fog / fog-libvirt

libvirt provider for fog
MIT License
16 stars 42 forks source link

Support VirtIO for ceph RBD disks #67

Closed Bluewind closed 4 years ago

Bluewind commented 4 years ago

During testing I discovered that my VM can freez with heavy disk load. I also saw the following error in the libvirt/qemu log file:

lsi_scsi: error: ORDERED queue not implemented

It appears that the SCSI driver is not very well tested or widely used since this problem has also been seen by others roughly 9 years ago. The suggestion back then on the "kvm" mailing list was to just use VirtIO because it is better maintained:

https://kvm.vger.kernel.narkive.com/gh8t5TeK/virtual-scsi-disks-hangs-on-heavy-io

Signed-off-by: Florian Pritz bluewind@xinu.at

NOTE: I have not actually tested this patch in a deployed system, but I did test the if line in irb.

olifre commented 4 years ago

Interesting! @wiene and I are running > 50 VMs in our infrastructure and never saw this, even after pulling the plugs out of all file servers (thus stalling I/O) for > 30 minutes repeatedly (see also this talk for more information on our setup). The discussion you linked, though, mentions only the LSI SCSI controller support being incomplete. We don't use that, and it also seems this is not used anymore in recent years. We use RHEL 7 here, and /usr/libexec/qemu-kvm -device ? 2>&1 | grep -i LSI produces no output. I interpret the thread such that LSI is (or rather, was) badly supported, but not SCSI in general.

On the contrary, it seems virtio-scsi sees much more attention - virtio-blk only learnt about "discard" very recently, while virtio-scsi has had support for that since many years.

Are you using LSI emulation in your case? Especially if not, it might be good to send a report to the KVM list to raise attention on the matter.

In any case, your patch LGTM and if it helps to get it merged, here's my :+1:.

@dvanders Your infrastructure is "a bit" larger - did you already collect large-scale experience with virtio-scsi? Did you observe such an issue?

dvanders commented 4 years ago

@dvanders Your infrastructure is "a bit" larger - did you already collect large-scale experience with virtio-scsi? Did you observe such an issue?

Alas we're still on virtio-blk so I can't help here.

Bluewind commented 4 years ago

We are running ubuntu 18.04 and there I see some "LSI MegaRaid" and "LSI SAS" devices in the output of qemu-system-x86_64 -device '?' 2>&1. I'm not sure why they are still there and why ubuntu uses them as defaults, but I'm glad to see that others are starting to switch to more maintained modules.

Since I created this PR, I have already tried virtio-scsi and it worked much better (no crashes), but it was also slower than virtio-blk. Newer qemu will support TRIM with virtio-blk so I figured it still makes sense to get this merged even though it will now be for a different reason. If you want, I can update the commit message to reflect this.

olifre commented 4 years ago

I figured it still makes sense to get this merged even though it will now be for a different reason

I fully agree. I think explicitly referencing the "LSI" SCSI emulation in the commit message would already be sufficient to clarify the matter. But @strzibny is in any case the person to review and merge ;-).

Concerning performance, it seems the answer there is also "it depends". See for example the extensive tests here: https://mpolednik.github.io/2017/01/23/virtio-blk-vs-virtio-scsi/

Most observed differences are likely caused by caching happening on different layers and, hence, latencies having different effects, which may lead to virtio-blk being faster in one environment and virtio-scsi being faster in another. So it's surely good to support both (as a side-note, it seems oVirt even does consider virtio-scsi the successor of virtio-blk: https://www.ovirt.org/develop/release-management/features/storage/virtio-scsi.html but I don't think it's that simple).

strzibny commented 4 years ago

I am fine with this now. It keeps the current defaults and it's and easy to follow addition. @olifre do you have any further comments before merging?

olifre commented 4 years ago

That looks good to me, no additional comments :+1: .