coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
260 stars 60 forks source link

Consider zstd for compression of shipped artifacts #1660

Open dustymabe opened 5 months ago

dustymabe commented 5 months ago

I did some investigation into zstd as our default compression algorithm. I set the compression level of zstd to 19 and xz to 9 (what we use today). Here is what I see for times on compress and decompress using xz of the metal and qemu artifacts:

Targeting build: 39.20240131.dev.0
Compressing: builds/39.20240131.dev.0/x86_64
2024-01-31 04:30:40,161 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json']
Compressed: fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json.xz
2024-01-31 04:30:40,209 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2']
Compressed: fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz
2024-01-31 04:32:34,082 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-metal.x86_64.raw']
Compressed: fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz
Skipped compressing artifacts: ostree
Updated: builds/39.20240131.dev.0/x86_64/meta.json
+ rc=0
+ set +x

real    3m50.097s
user    0m0.155s
sys     0m0.153s

Targeting build: 39.20240131.dev.0
Uncompressing: builds/39.20240131.dev.0/x86_64
2024-01-31 04:51:32,434 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json
2024-01-31 04:51:32,452 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2
2024-01-31 04:51:38,337 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-metal.x86_64.raw
Skipped uncompressing artifacts: ostree
Updated: builds/39.20240131.dev.0/x86_64/meta.json
+ rc=0
+ set +x

real    0m13.809s
user    0m0.066s
sys     0m0.070s

and here is what I see for zstd:

 Compressing: builds/39.20240131.dev.1/x86_64
2024-01-31 04:42:08,112 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json']
Compressed: fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json.zst
2024-01-31 04:42:08,138 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2']
Compressed: fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst
2024-01-31 04:43:35,600 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-metal.x86_64.raw']
Compressed: fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst
Skipped compressing artifacts: ostree
Updated: builds/39.20240131.dev.1/x86_64/meta.json
+ rc=0
+ set +x

real    3m2.790s
user    0m0.124s
sys     0m0.150s

Targeting build: 39.20240131.dev.1
Uncompressing: builds/39.20240131.dev.1/x86_64
2024-01-31 04:50:07,629 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json
2024-01-31 04:50:07,636 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2
2024-01-31 04:50:10,480 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-metal.x86_64.raw
Skipped uncompressing artifacts: ostree
Updated: builds/39.20240131.dev.1/x86_64/meta.json
+ rc=0
+ set +x

real    0m9.579s
user    0m0.051s
sys     0m0.071s

and here is what the difference in sizes look like:

        "qemu": {
            "path": "fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz",
            "sha256": "5e594eb29feb65e670e8c7e175d9b69eb31643ae9891074856bbd32b8bef2d56",
            "size": "662MiB",
            "uncompressed-sha256": "a117e5c02b04d93e158e246eca7409447d4808fd63e1d4a012fb688c613fc0e6",
            "uncompressed-size": "1609MiB"
        },
        "metal": {
            "path": "fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz",
            "sha256": "132bc17c89ba82b9d0e91c3886b92447c0d1893c7c05ddeccc99b11706ec7b3a",
            "size": "661MiB",
            "uncompressed-sha256": "a8c1f04549136b3828bcb1beea7105f3b1ee70b17682cd9da5034a3ccf73b16c",
            "uncompressed-size": "2506MiB"
        }
        "qemu": {
            "path": "fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst",
            "sha256": "7697713189ff720a2a082b23948365fcdc6c71244f127ab6a16c99b11c2aec5e",
            "size": "720MiB",
            "uncompressed-sha256": "693edcc03dcb202775424c6fc4d9757a2042b374335100800b802fd0f82048e3",
            "uncompressed-size": "1609MiB"
        },
        "metal": {
            "path": "fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst",
            "sha256": "563082baaef35847307f5ebff796992bfa1826589453861abdd905eec0d77dca",
            "size": "714MiB",
            "uncompressed-sha256": "0b802bd0a7b45b3a40760a25282c5bd8cccaa06ec180cfd87bcf033d50dde25d",
            "uncompressed-size": "2506MiB"
        }

To summarize:

Algo Time Compress Time Decompress QEMU Uncompressed QEMU Compressed Metal Uncompressed Metal Compressed
xz 3m50.097s 0m13.809s 1609MiB 662MiB 2506MiB 661MiB
zstd 3m2.790s 0m9.579ss 1609MiB 720MiB 2506MiB 714MiB

So we get about a 20% speedup in compression and 30% speedup in decompression with the tradeoff of 8-9% larger compressed files.

dustymabe commented 5 months ago

Looking at our pipelines the compression step takes around 30m:

[2024-01-29T18:14:28.266Z] + set -xeuo pipefail
[2024-01-29T18:14:28.266Z] ++ umask
[2024-01-29T18:14:28.266Z] + '[' 0022 = 0000 ']'
[2024-01-29T18:14:28.266Z] + cosa compress
[2024-01-29T18:14:28.266Z] Targeting build: 39.20240128.1.0
[2024-01-29T18:14:28.519Z] Compressing: builds/39.20240128.1.0/x86_64
[2024-01-29T18:14:28.519Z] 2024-01-29 18:14:28,343 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-ostree.x86_64-manifest.json']
[2024-01-29T18:14:28.519Z] Compressed: fedora-coreos-39.20240128.1.0-ostree.x86_64-manifest.json.xz
[2024-01-29T18:14:28.519Z] 2024-01-29 18:14:28,379 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-qemu.x86_64.qcow2']
[2024-01-29T18:17:34.891Z] Compressed: fedora-coreos-39.20240128.1.0-qemu.x86_64.qcow2.xz
[2024-01-29T18:17:34.891Z] 2024-01-29 18:17:21,459 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-azure.x86_64.vhd']
[2024-01-29T18:20:26.296Z] Compressed: fedora-coreos-39.20240128.1.0-azure.x86_64.vhd.xz
[2024-01-29T18:20:26.296Z] 2024-01-29 18:20:12,008 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-aws.x86_64.vmdk']
[2024-01-29T18:22:02.671Z] Compressed: fedora-coreos-39.20240128.1.0-aws.x86_64.vmdk.xz
[2024-01-29T18:22:02.671Z] 2024-01-29 18:22:02,288 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-openstack.x86_64.qcow2']
[2024-01-29T18:24:54.058Z] Compressed: fedora-coreos-39.20240128.1.0-openstack.x86_64.qcow2.xz
[2024-01-29T18:24:54.058Z] 2024-01-29 18:24:50,011 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-aliyun.x86_64.qcow2']
[2024-01-29T18:27:45.446Z] Compressed: fedora-coreos-39.20240128.1.0-aliyun.x86_64.qcow2.xz
[2024-01-29T18:27:45.446Z] 2024-01-29 18:27:38,062 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-metal.x86_64.raw']
[2024-01-29T18:30:21.800Z] Compressed: fedora-coreos-39.20240128.1.0-metal.x86_64.raw.xz
[2024-01-29T18:30:21.800Z] 2024-01-29 18:30:08,872 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-metal4k.x86_64.raw']
[2024-01-29T18:32:58.169Z] Compressed: fedora-coreos-39.20240128.1.0-metal4k.x86_64.raw.xz
[2024-01-29T18:32:58.169Z] 2024-01-29 18:32:47,924 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-azurestack.x86_64.vhd']
[2024-01-29T18:35:49.569Z] Compressed: fedora-coreos-39.20240128.1.0-azurestack.x86_64.vhd.xz
[2024-01-29T18:35:49.569Z] 2024-01-29 18:35:38,359 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-exoscale.x86_64.qcow2']
[2024-01-29T18:38:41.159Z] Compressed: fedora-coreos-39.20240128.1.0-exoscale.x86_64.qcow2.xz
[2024-01-29T18:38:41.159Z] 2024-01-29 18:38:35,313 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-ibmcloud.x86_64.qcow2']
[2024-01-29T18:41:32.525Z] Compressed: fedora-coreos-39.20240128.1.0-ibmcloud.x86_64.qcow2.xz
[2024-01-29T18:41:32.525Z] 2024-01-29 18:41:26,729 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-vultr.x86_64.raw']
[2024-01-29T18:44:23.916Z] Compressed: fedora-coreos-39.20240128.1.0-vultr.x86_64.raw.xz
[2024-01-29T18:44:23.916Z] Skipped compressing artifacts: ostree applehv nutanix kubevirt hyperv gcp digitalocean vmware virtualbox live-iso live-kernel live-initramfs live-rootfs
[2024-01-29T18:44:23.916Z] Updated: builds/39.20240128.1.0/x86_64/meta.json

So we could possibly save 6-8m per run just on that. This could then compound because each of our CI runs may or may not run cosa compress too.

dustymabe commented 5 months ago

Another thing to mention here is that in my tests I used zstd compression level of 19 which is the highest you can specify without using --ultra which requires a lot more memory.

We could experiment with different levels to see what the differences are in size versus speed, but I assumed we wanted to increase the size as little as possible so I used 19.

jlebon commented 5 months ago

Huh, I was expecting more drastic differences in compression/decompression times. IMO spending an extra 6-8 minutes for 8% smaller images is worth it.

dustymabe commented 5 months ago

Some more data:

Level Time Compress Time Decompress Metal Uncompressed Metal Compressed
19 3m2.790s 0m9.579ss 2506MiB 714MiB
14 0m53.438s 0m9.751s 2506MiB 754.8MiB
10 0m21.477s 0m9.487s 2506MiB 757.8MiB
5 0m7.368s 0m9.698s 2506MiB 793.5MiB
dustymabe commented 5 months ago

If we went with something like level 10 we'd get a 90% speedup in compression which I think would take our compress stage in our pipeline down to ~5m. The increase in image size would be around 10-15%.

jlebon commented 5 months ago

This was discussed in today's community meeting:

jbtrystram commented 5 months ago

I did some additionnal testing on the live qemu file

Level Time Compress Time Decompress qemu Uncompressed qemu Compressed
19 6m3.53s 0m1.66s 1611MiB 722MiB
14 1m16.84s 0m1.37s 1611MiB 763MiB

Another note: zstd was not installed in my f39 toolbox by default

baude commented 5 months ago

As a corollary, I also did some testing yesterday with the qemu image. The datasize compressed and uncompressed (cols 4 &5) were equivalent. No surprise there. Where the results differed for me was the decompression time. Mine was consistently double that. Were you passing any additional command-line switches?

jbtrystram commented 5 months ago

Where the results differed for me was the decompression time. Mine was consistently double that. Were you passing any additional command-line switches?

Simply running unzstd

Cyan4973 commented 4 months ago

Given that the image files tested are very large, an interesting zstd option worth trying is --long, giving the complete command : zstd -10 -T0 --long. It may help detect repetitions (like near-identical files in the archive) at long distance.

jlebon commented 3 months ago

So... with the recent xz news, a lot of trust was lost in that project. Apart from the other benefits listed in this ticket, switching to zstd would now also avoid forcing people to use xz to use our artifacts if they're not comfortable with that.

baude commented 3 months ago

we have been using zstd with fcos images in podman machine now for a couple of months. lots of upside comments about the quicker decompression

jlebon commented 1 month ago

Testing decompression speeds locally on a local 1.6G qcow2, I get 17.8s for xz and 0.99s for zstd. Weird that the decompression difference in https://github.com/coreos/fedora-coreos-tracker/issues/1660#issue-2109301823 between xz and zstd isn't much larger. As mentioned I think in the last community meeting where we discussed this, it's possibly hardware-related.