cirruslabs / gitlab-tart-executor

GitLab Runner executor to run jobs in Tart VMs
MIT License
60 stars 5 forks source link

Support for block devices as builds/cache dirs #60

Closed jlsalmon closed 9 months ago

jlsalmon commented 9 months ago

Would it be possible to add support to the gitlab executor for using block devices? Specifically I would like to try using a block device as the builds/cache dirs (since virtiofs mounts via --builds-dir and --cache-dir don't currently work due to cirruslabs/tart#567).

I'm not sure how this would work with the requirement to run tart as root when attaching block devices though.

fkorotkov commented 9 months ago

We can probably add support for passing block devices via --builds-dir and --cache-dir. Might be a bit tricky to map such mounts inside the VM especially if both --builds-dir and --cache-dir are specified.

edigaryev commented 9 months ago

Hello Justin ๐Ÿ‘‹

Have you considered using a distributed cache instead?

It works just as snappy as a mounted directory/block device, but without the mounted directory bugs and extra complexity of managing the block devices (namely, Tart process running with elevated privileges, filesystem corruption due to concurrent access from multiple jobs, etc).

Here's a quick example snippet that once put in ~/.gitlab-runner/config.tom, will cache everything to minio server /path/to/a/directory (assuming that it's accessible on minio.local):

[[runners]]
  [runners.cache]
    Type = "s3"
    [runners.cache.s3]
      ServerAddress = "minio.local:9000" 
      AccessKey = "minioadmin"
      SecretKey = "minioadmin"
      BucketName = "gitlab-cache"
      Insecure = true

MinIO itself can be easily installed with brew install minio, and you can either self-host (don't forget to configure proper security) or use cheap S3-compatible object storages like Cloudflare R2, Backblaze B2, etc.

jlsalmon commented 9 months ago

Hi @edigaryev,

Many thanks for your reply, and for the suggestion.

Our use case involves building repositories that are hundreds of GBs in size, so using a pre-existing pre-cloned repo for each build is a hard requirement. Since the S3 caching feature up/downloads zips of the cached files on each run, unfortunately it would be prohibitively slow for us.

Reusing a repo is currently trivial on a bare metal macOS GitLab runner via [[runners]] builds_dir. If it werenโ€™t for the bugs in virtiofs, this would also be easily achieved using the existing โ€”builds-dir feature in this executor to mount the repo directory from the host.

So, we were hoping that we might have better luck using a block device to store repos and have it mounted in the build job. Ultimately we are looking to achieve build isolation for macOS builds (multiple Xcode versions, multiple OS versions, etc) as an improvement on our current bare metal solution.

We are also looking into some experimental early use of Docker for this purpose, from where I was originally pointed in this direction by @fkorotkov .

edigaryev commented 9 months ago

Hi @jlsalmon,

Thanks for clarifying this.

Assuming your goals (build isolation but no resource utilization), do I understand right that it would it be acceptable for you to only being able to run a single Tart VM when using the block device option?

Otherwise I don't see how more than one Tart VM would be able to access the same block device without causing filesystem corruption and other undesirable side-effects.

jlsalmon commented 9 months ago

@edigaryev it would be acceptable to run only a single tart VM in the first instance, yes. I suppose it might be possible later on to support two VMs, either using two block devices (e.g. two partitions or two external drives) or or using unique build root directories per VM on a single block device. But one VM would be already a big achievement ๐Ÿ˜Š

edigaryev commented 9 months ago

Another thing I have in mind is that you don't need an actual block device access to be able to mount a filesystem into a Tart VM.

You could simply do truncate -s 50GB disk.img and mount it similarly to a block device using --disk disk.img.

This disk image could reside anywhere you want (e.g. on a fast storage attached to a Mac) and does not require privilege escalation via sudo and running Tart as root.

The only downside is that this disk image needs to be properly formatted as APFS/HFS+/etc, which is something that diskutil seems to be struggling with, expecting a real disk device as an input.

However, this disk image can be easily formatted from within a Tart VM and then re-used many times for subsequent CI runs.

Would that be acceptable for you? If so, we can simply add a --disk argument to prepare stage (similarly to --dir argument) and that would do the trick, without requiring the sudo privilege escalation mumbo-jumbo.

edigaryev commented 9 months ago

it would be acceptable to run only a single tart VM in the first instance, yes

Thanks for clarifying this too!

Please also check https://github.com/cirruslabs/gitlab-tart-executor/issues/60#issuecomment-1945625682 and let me know what you think.

If that works for you, we can even go further and clone that golden disk image for each Tart VM invocation (since APFS supports fast file copies using CoW). This will allow to safely run more than one Tart VM using a given golden disk image, the only downside is that changes to that cloned disk image won't be propagated back to the golden image.

jlsalmon commented 9 months ago

@edigaryev thanks for the tip about using disk images, I'd be happy to give that a shot! I would need changes to the disk image(s) to be persisted across runs, so not sure about the cloning images part, but I'm sure there are options there ๐Ÿ‘

edigaryev commented 9 months ago

@jlsalmon please check out the new 0.11.0 release, it now allows you to specify --disk arguments in the prepare stage, which in your case should point to a disk file on your host system.

edigaryev commented 9 months ago

Another option to actually mount the block device is to change its access permissions:

sudo chown $USER /dev/disk42

This way no sudo/privilege escalation is needed, and you can use the same --disk argument that you'd use to mount an additional disk image.

However, this approach is more error-prone (or in other words, requires some scripting around) than the disk image one, because macOS has no /dev/disk/by-id symlinks, which requires you to find the actual new disk endpoint for your disk each time it's re-mounted (e.g. /dev/disk4 may become /dev/disk3).

edigaryev commented 9 months ago

I've also just realized that we need a way to tell the GitLab Tart Executor whether the --builds-dir and --cache-dir are on host or in the guest to make this all work, so re-opening.

jlsalmon commented 9 months ago

Thanks for this @edigaryev, I just tried it out and it's working great. I'm currently assessing the performance of a few different disk image types (sparse bundles, sparse images and regular disk images). I'll also try out your truncate suggestion. Initial performance numbers suggest at least one of the types will be good enough for our use case ๐ŸŽ‰

fkorotkov commented 9 months ago

@jlsalmon will be great if you could share your experience once it's working for you. Seems like a great piece of engineering you are working on!

jlsalmon commented 8 months ago

So I found that the only method which has reasonable performance for my use case is directly attaching an APFS volume (either by physically partitioning the host disk, or using an external APFS-formatted storage device). I managed to achieve 60% of the host-native performance with this method on a 2020 M1 Mac mini running macOS 14.2.

File-based methods (sparsebundle, DMG, raw APFS-formatted file) were just too slow. The best was the sparsebundle, which achieved 25% of the host-native performance.

It seems that in my experience it's not possible to get near-native performance when disk IO is a dominant factor of the workload. I also tried XcodeBenchmark and could only reach a maximum of 65% of the host-native performance (again using an APFS volume attachment).

FYI @fkorotkov

fkorotkov commented 8 months ago

Thank you for the data points! Can I ask you how did you run XcodeBenchmark in a VM. I can't reproduce the 65% slowness.

I use an M1 Mac Minis with 16Gb of memory and without isolation sh benchmark.sh runs in 240-250 seconds. Inside a VM which matches host resources (tart set --cpu 8 --memory 16384 <VM-NAME>) I'm getting around 260-270 seconds for the same script.

jlsalmon commented 8 months ago

@fkorotkov on my 2020 M1 Mac mini with 8 CPUs/16GB memory without isolation, sh benchmark.sh takes between 160-170 seconds:

$ git clone https://github.com/devMEremenko/XcodeBenchmark
$ cd XcodeBenchmark
$ sh benchmark.sh
Preparing environment
Running XcodeBenchmark...
Please do not use your Mac while XcodeBenchmark is in progress

..snip...

** BUILD SUCCEEDED ** [168.887 sec]

System Version: 14.2.1
Xcode 15.2
Hardware Overview
      Model Name: Mac mini
      Model Identifier: Macmini9,1
      Total Number of Cores: 8 (4 performance and 4 efficiency)
      Memory: 16 GB

Doing the same thing inside a fresh VM with matched host resources (not using any disk attachments) takes between 250-260 seconds:

$ tart clone ghcr.io/cirruslabs/macos-sonoma-xcode:15.2 test
$ tart set --cpu 8 --memory 16384 test
$ tart run --no-graphics test &
$ ssh admin@$(tart ip test)
admin@admins-Virtual-Machine ~ % git clone https://github.com/devMEremenko/XcodeBenchmark
admin@admins-Virtual-Machine ~ % cd XcodeBenchmark
admin@admins-Virtual-Machine XcodeBenchmark % sh benchmark.sh
Preparing environment
Running XcodeBenchmark...
Please do not use your Mac while XcodeBenchmark is in progress

...snip...

** BUILD SUCCEEDED ** [254.395 sec]

System Version: 14.3
Xcode 15.2
Hardware Overview
      Model Name: Apple Virtual Machine 1
      Model Identifier: VirtualMac2,1
      Total Number of Cores: 8
      Memory: 16 GB

I did many runs on both host and guest. I get roughly 66% of host performance on the guest. I'm not sure why my host results are faster than yours. It's worth noting that this Mac mini is booted from a Samsung T5 external SSD, and is not using the internal drive. But I would expect that to make the benchmark run slower, so that makes it even more confusing ๐Ÿค”.