qemu images would be more convenient with non-xz'd disk image and kernels

nwf commented 3 years ago

As per https://github.com/CTSRD-CHERI/cheri-docker-images/blob/c2834f8a78186f3477e0eee9ddc6e2d9e9a8f50e/cheribsd/qemu/Dockerfile#L6 we just copy the compressed artifacts; it might be nicer to uncompress them before adding them to the image (which should then recompress them, yes?), but I'm not sure if that's the tack we want to take or if we want to just avoid compressing them in the build pipeline in the first place or convert them to qcow2 (thanks, Jess, for the suggestion) here or earlier in the pipeline or something else entirely.

As it stands, users have to unxz the kernel and image they want first, which isn't that much work, but is a little sad.

arichardson commented 3 years ago

I haven't looked into the resulting download sizes but would be a bit more convenient. We could try pushing both versions (and maybe the qcow2 as well) to docker hub and compare how much is downloaded for each.

arichardson commented 3 years ago

Using QCow2 looks like an easy win:

~/cheri/output> qemu-img info cheribsd-morello-purecap.img
image: cheribsd-morello-purecap.img
file format: raw
virtual size: 2.72 GiB (2916090368 bytes)
disk size: 2.7 GiB
~/cheri/output> qemu-img info cheribsd-morello-purecap.qcow2
image: cheribsd-morello-purecap.qcow2
file format: qcow2
virtual size: 2.72 GiB (2916090368 bytes)
disk size: 1.5 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

arichardson commented 3 years ago

However, xz -9 is still significantly smaller and I doubt docker compresses layers that much?

-rw-r--r--  1 alex  staff  2916090368 10 Jun 14:58 cheribsd-morello-purecap.img
-rw-r--r--  1 alex  staff   200731036 10 Jun 14:58 cheribsd-morello-purecap.img.xz
-rw-r--r--  1 alex  staff  1589379072 26 Aug 11:53 cheribsd-morello-purecap.qcow2
-rw-r--r--  1 alex  staff   199961488 26 Aug 11:53 cheribsd-morello-purecap.qcow2.xz

I wonder if it makes sense to decompress automatically inside the ENTRYPOINT?

nwf commented 3 years ago

I'm entertained that the .qcow2.xz is smaller than the .img.xz. It looks like docker used to xz their images but stopped due to tar implementation compatibility concerns. :(

I'd rather not decompress in the ENTRYPOINT, if we can get away with it. Note that if you add -c to the qemu-img convert -O qcow2 command, the gain of xz is less significant, as the disk image transparently uses zlib on its data:

$ qemu-img convert -O qcow2 cheribsd-riscv64-purecap.img -c cheribsd-riscv64-purecap.qcow2
-rw-r--r--  1 root root  456065024 Aug 26 11:47 cheribsd-riscv64-purecap.qcow2

Newer qemus also have support for zstd compression within qcow2 itself, which might further reduce the margin, as per https://wiki.qemu.org/ChangeLog/5.1 , but, experimentally, it only shaves off a little bit. I had to build qemu-img in an environment with libzstd-dev installed; Debian apparently doesn't do that by default. Anyway, /cheri/out/mainline/sdk/bin/qemu-img convert -O qcow2 -o compression_type=zstd -c cheribsd-riscv64-purecap.img cheribsd-riscv64-purecap.zstd.qcow2 generated a 425328640 byte image, which is ~7% smaller than 456065024, but not quite as impressive as the factor of two smaller that the .img.xz would get us.

Still, I think the factor of two is worth being able to directly use the disk images, without needing to decompress separately. If the transport layer ever does become compressed again, it looks like it will be able to recover the difference and so it'd merely be a matter of disk space.

jrtc27 commented 3 years ago

-rw-r--r--  1 Jess  staff   183M 26 Aug 14:32 cheribsd-aarch64.gz.qcow2
-rw-r--r--  1 Jess  staff   1.8G 26 Aug 13:43 cheribsd-aarch64.img
-rw-r--r--  1 Jess  staff    98M 26 Aug 13:43 cheribsd-aarch64.img.xz
-rw-r--r--  1 Jess  staff   611M 26 Aug 14:20 cheribsd-aarch64.qcow2
-rw-r--r--  1 Jess  staff   182M 26 Aug 14:32 cheribsd-morello-purecap.gz.qcow2
-rw-r--r--  1 Jess  staff   1.8G 26 Aug 13:43 cheribsd-morello-purecap.img
-rw-r--r--  1 Jess  staff    96M 26 Aug 13:43 cheribsd-morello-purecap.img.xz
-rw-r--r--  1 Jess  staff   614M 26 Aug 14:20 cheribsd-morello-purecap.qcow2
-rw-r--r--  1 Jess  staff   435M 26 Aug 14:33 cheribsd-riscv64-purecap.gz.qcow2
-rw-r--r--  1 Jess  staff   4.6G 26 Aug 13:43 cheribsd-riscv64-purecap.img
-rw-r--r--  1 Jess  staff   237M 26 Aug 13:43 cheribsd-riscv64-purecap.img.xz
-rw-r--r--  1 Jess  staff   1.4G 26 Aug 14:20 cheribsd-riscv64-purecap.qcow2
-rw-r--r--  1 Jess  staff   242M 26 Aug 14:33 cheribsd-riscv64.gz.qcow2
-rw-r--r--  1 Jess  staff   4.0G 26 Aug 13:43 cheribsd-riscv64.img
-rw-r--r--  1 Jess  staff   135M 26 Aug 13:43 cheribsd-riscv64.img.xz
-rw-r--r--  1 Jess  staff   795M 26 Aug 14:20 cheribsd-riscv64.qcow2

FWIW (.gz.qcow2 being qemu-img -c), latest Jenkins artifacts

CTSRD-CHERI / cheri-docker-images

qemu images would be more convenient with non-xz'd disk image and kernels #2