apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.97k stars 1.09k forks source link

Snapshot with copy to secondary from rbd to nfs slow and failing because of qemu-img -O qcow2 #9408

Open StepBee opened 1 month ago

StepBee commented 1 month ago
ISSUE TYPE
COMPONENT NAME
Snapshot with copy to secondary storage
CLOUDSTACK VERSION
4.19
CONFIGURATION

Primary Storage based on Ceph / RBD Secondary Storage based on Ceph / NFS via Ganesha Gateway

OS / ENVIRONMENT

Cloudstack Agent on Ubuntu using KVM

SUMMARY

When creating snapshots of volumes located on Ceph/RBD with setting snapshot.backup.to.secondary = true

  1. RBD Snapshot is created on the primary storage
  2. on the hypervisor, "qemu-img convert -O qcow2 ..... /mnt//snapshots/../" is executed

The first step is running fast as expected, as it's only an rbd snapshot. But the second step is slow, unbelievable slow.

The slow qemu-img results in timeouts and failed snapshots/backups for larger volumes, even when increasing wait timeouts etc.

I compared the performance with raw output qemu-img convert -O raw ..... /mnt//snapshots/../

and with rbd export (which exports the snapshot to a raw file)

Example numbers from a test: qemu-img -O qcow2 = 100 Mbit/s qemu-img -O raw = 5 Gbit/s rbd export (raw output file) = 8 Gbit/s

I am aware that a qcow2 file has benefits, when parts of the image are "empty" which will result in smaller image files - but the performance difference, specifically for large filled up disks, is enormous.

From the code in https://github.com/apache/cloudstack/blob/main/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java#L1010 is see, the output format is hardcoded to qcow2 and i read this was changed from raw to qcow2 somewhere in the past.

My questions:

STEPS TO REPRODUCE
1. create a large volume on a RBD based primary storage
2. fill up the disk
3. set snapshot.backup.to.secondary = true
4. create a snapshot of the volume

instead of step 4, you can run the commands manually

QEMU-IMG output qcow2
qemu-img convert -O qcow2 -U "rbd:<rbd-pool>/<rbd-image>@<snapshot>:mon_host=1xxxxx\:6789:auth_supported=cephx:id=xxxxx:key=xxxxx:rbd_default_format=2:client_mount_timeout=30" /mnt/<uuid>/image-backup.qcow2

QEMU-IMG output raw
qemu-img convert -O raw -U "rbd:<rbd-pool>/<rbd-image>@<snapshot>:mon_host=1xxxxx\:6789:auth_supported=cephx:id=xxxxx:key=xxxxx:rbd_default_format=2:client_mount_timeout=30" /mnt/<uuid>/image-backup.raw

RBD export
rbd -c /etc/ceph/ceph.conf --id xxxx export <rbd-pool>/<rbd-image>@<snapshot> /mnt/<uuid>/rbd-export-backup.raw
EXPECTED RESULTS
Snapshots should be copied to secondary storage with a more performant option than qemu-img -O qcow2.
Maybe it could be an idea to provide the qemu-img output format as a configurable setting, as raw output would speed up the process already.
Or using "rbd export" with raw output.

Both raw options will probably use more backup space than the qcow2 option.
But considering the enormous performance difference, i'd rather provide more space than have failed snapshots constantly.
ACTUAL RESULTS
Currently most of our snapshots of larger volumes are failing because of timeouts, which result from very poor performance of using qemu-img -O qcow2
rohityadavcloud commented 1 month ago

This is a well known limitation and was/related discussed previously at https://github.com/apache/cloudstack/issues/5660 rbd export will be faster.