cirruslabs / gitlab-tart-executor

GitLab Runner executor to run jobs in Tart VMs
MIT License
60 stars 5 forks source link

Gitlab runner running out of space inconsistently when pulling image #91

Closed riain0 closed 1 week ago

riain0 commented 2 weeks ago

Hi,

recently we've started to see an issue where our macOS runners run out of space when starting up - it seems like they are trying to pull the image even when it exists on the machine.

Failing Gitlab job log:

Running with gitlab-runner 17.5.3 (12030cf4)
  on <runner ID>, system ID: <system ID>
Resolving secrets
Preparing the "custom" executor
02:26
Using Custom executor...
2024/11/07 08:41:17 Pulling the latest version of <aws account id>.dkr.ecr.eu-central-1.amazonaws.com/mirror/github/cirruslabs/macos-sonoma-xcode:16...
08:43:42 tart command returned non-zero exit code: "Error: The operation couldn’t be completed. No space left on device"
WARNING: Cleanup script failed: exit status 1
ERROR: Job failed: exit status 1

Healthy Gitlab job logs

Running with gitlab-runner 17.5.3 (12030cf4)
  on <runner ID> p4v55zsvP, system ID: <system ID>
Resolving secrets
Preparing the "custom" executor
Using Custom executor...
Pulling the latest version of <aws account id>.dkr.ecr.eu-central-1.amazonaws.com/mirror/github/cirruslabs/macos-sonoma-xcode:16...
2024/11/07 10:32:39 Cloning and configuring a new VM...
2024/11/07 10:32:39 Waiting for the VM to boot and be SSH-able...
2024/11/07 10:33:08 Was able to SSH!
2024/11/07 10:33:08 VM is ready.

here is the result from the list command on the macOS ec2 instance

bash-3.2# tart list
Source Name                                                                                                                                                                Disk Size SizeOnDisk State
OCI     <aws account id>.dkr.ecr.eu-central-1.amazonaws.com/mirror/github/cirruslabs/macos-sonoma-xcode:16                                                                      100  75   75         stopped
OCI   <aws account id>.dkr.ecr.eu-central-1.amazonaws.com/mirror/github/cirruslabs/macos-sonoma-xcode@sha256:fe4d39b258293f8e7b78f7caa49fda3b806f567b3057a39fef88bcd727812844 100  75   75         stopped

here is our gitlab runner config

listen_address = ":8083"
concurrent = 2
check_interval = 0
log_format = "json"
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "<ec2 instance ID>"
  output_limit = 20480
  url = "<internal gitlab alb URL>"
  id = <some integer id>
  token = "<redacted>"
  token_obtained_at = <some date>
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "custom"
  builds_dir = "/Users/admin/gitlab/builds"
  cache_dir = "/Users/admin/gitlab/cache"
  environment = ["GITLAB_V2=true", "CI_JOB_IMAGE=<aws account ID>.dkr.ecr.eu-central-1.amazonaws.com/mirror/github/cirruslabs/macos-sonoma-xcode:16"]
  [runners.custom_build_dir]
  [runners.cache]
    Type = "s3"
    Shared = true
    MaxUploadedArchiveSize = 0
    [runners.cache.s3]
      ServerAddress = "s3.amazonaws.com"
      BucketName = "<bucket name>"
      BucketLocation = "eu-central-1"
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.custom]
    config_exec = "gitlab-tart-executor"
    config_args = ["config"]
    prepare_exec = "gitlab-tart-executor"
    prepare_args = ["prepare", "--auto-prune=false"]
    run_exec = "gitlab-tart-executor"
    run_args = ["run"]
    cleanup_exec = "gitlab-tart-executor"
    cleanup_args = ["cleanup"]

We pull the default image on EC2 instance start up to speed jobs up.

Tart version: 2.20.0 Gitlab tart executor version: 1.19.0-cbb18ff Gitlab version: 17.5.3

fkorotkov commented 2 weeks ago

This image was updated on Saturrday. Do you have enough space for two images on the machine? Tart will download the new image and only then will remote the old one so it will consume twice the disk space on an update.

riain0 commented 1 week ago

Ah, maybe that's it 🤔 Do you change the contents of a published tag often? I'll increase the storage to double the required and see if it helps.

fkorotkov commented 1 week ago

Check out this section for the update cadence.