cirruslabs / gitlab-tart-executor

GitLab Runner executor to run jobs in Tart VMs
MIT License
60 stars 5 forks source link

Tart fails with "Error: The Internet connection appears to be offline." #88

Closed gitperr closed 33 minutes ago

gitperr commented 1 month ago

Hi, we recently updated one of our mac runners to macOS 15.0.1

sw_vers                 
ProductName:        macOS
ProductVersion: 15.0.1

and started experiencing this issue:

Running with gitlab-runner 16.8.0 (c72a09b6)
  on <some.runner.group> aHi6uRQSm, system ID: s_b649a000851a
Preparing the "custom" executor
00:00
Using Custom executor...
2024/10/14 10:16:52 Pulling the latest version of <some.registry.url>/<some.image_name>...
2024/10/14 10:16:52 tart command returned non-zero exit code: "Error: The Internet connection appears to be offline."
WARNING: Cleanup script failed: exit status 1
ERROR: Job failed: exit status 1

There is internet connection on the runner though, I tested by pinging various places, also by visiting different websites. Also, the image it is trying to pull is already present on the runner. When I run tart pull and try pulling the same image, it says that it is present like so:

image is already cached and linked!
tart --version
2.18.5

The problem has not happened in earlier versions of macOS or Tart, and my hunch is that it is not related to Tart version either, but something else that Tart is doing, which I may not be aware of... Or maybe it is macOS related.

Any pointers on what things I could look at? Thank you!

gitperr commented 1 month ago

Another thing I just found out:

So, it may be related to running it as a service.

gitperr commented 1 month ago

I ran into another issue after some more fiddling. (Updated gitlab-tart-executor to 1.19.0-cbb18ff, as earlier it was on 1.8.0, also updated gitlab-runner to 17.4.1)

  1. On the runner mac node: gitlab-runner run
  2. Trigger a pipeline on GitLab and watch the logs
  3. Get the following error:
    Running with gitlab-runner 17.4.1 (32fe5502)
    on <runner> aHi6uRQSm, system ID: s_b649a000851a
    Preparing the "custom" executor
    00:00
    Using Custom executor...
    2024/10/14 14:20:16 Pulling the latest version of <image>...
    2024/10/14 14:20:16 tart command returned non-zero exit code: "Error: Could not connect to the server."
  4. Check logs on the gitlab-runner:
    WARNING: 2024/10/14 14:20:16 Failed to stop VM: tart command returned non-zero exit code: "the specified VM \"gitlab-534216\" does not exist"  cleanup_std=err job=534216 project=581 runner=aHi6uRQSm
    WARNING: 2024/10/14 14:20:16 Failed to delete VM: VM errored: failed to delete VM gitlab-534216: tart command returned non-zero exit code: "the specified VM \"gitlab-534216\" does not exist"  cleanup_std=err job=534216 project=581 runner=aHi6uRQSm
    WARNING: 2024/10/14 14:20:16 VM errored: failed to delete VM gitlab-534216: tart command returned non-zero exit code: "the specified VM \"gitlab-534216\" does not exist"  cleanup_std=err job=534216 project=581 runner=aHi6uRQSm
    WARNING: Cleanup script failed: exit status 1       job=534216 project=581 runner=aHi6uRQSm
    WARNING: Job failed: exit status 1
                 duration_s=0.157331167 job=534216 project=581 runner=aHi6uRQSm

Looks like it is trying to delete a VM that does not exist, then errors out.

I've been using this as the runner config since the start:

concurrent = 1
listen_address = ":9252"
check_interval = 0
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "<OMITTED>"
  limit = 1
  output_limit = 30000
  url = "<OMITTED>"
  environment = ["TART_EXECUTOR_INSECURE_PULL=true"]
  id = 100
  token = "<OMITTED>"
  token_obtained_at = 2024-01-22T10:26:13Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "custom"
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.custom]
    cleanup_args = ["cleanup"]
    cleanup_exec = "/opt/homebrew/bin/gitlab-tart-executor"
    run_args = ["run"]
    prepare_args = ["prepare"]
    prepare_exec = "/opt/homebrew/bin/gitlab-tart-executor"
    config_args = ["config"]
    config_exec = "/opt/homebrew/bin/gitlab-tart-executor"
    tls_verify = false
    run_exec = "/opt/homebrew/bin/gitlab-tart-executor"
    volumes = ["/cache"]
  [runners.feature_flags]
    FF_RESOLVE_FULL_TLS_CHAIN = false
gitperr commented 1 month ago

So far I managed to solve the issue by downgrading to tart executor 1.8.0, and changing the startup from: gitlab-runner install && gitlab-runner start to sudo gitlab-runner install --user <user to run with> && sudo gitlab-runner start

edigaryev commented 1 month ago

This is likely related to the newly introduced "Local Network" permission in macOS Sequoia and the fact that GitLab Runner's binary has no LC_UUID identifier, which is critical for the Apple's Transparency Consent and Control framework.

Can you check if the workaround in https://github.com/cirruslabs/gitlab-tart-executor/issues/85#issuecomment-2363353178 works for you?

Without re-building the GitLab Runner (or waiting for the upstream fix) the permission above cannot take effect, even if you've explicitly allowed it in the GUI.

edigaryev commented 1 month ago

One more clue in https://github.com/cirruslabs/tart/issues/919.

Have you tried simply restarting the host? 🤔

gitperr commented 1 month ago

Restart did not work, but it is likely what you described with LC_UUID, because sudoing works. I'll test the workaround and let you know.

gitperr commented 1 month ago

I built like you suggested, and ran it with: gitlab-runner install && gitlab-runner start

did not help unfortunately.

dwarfdump shows it has UUID:

dwarfdump -u gitlab-runner 
UUID: 12BD2F27-6F42-3953-84FF-B8DE2ADEE7AA (arm64) gitlab-runner
edigaryev commented 1 month ago

Just to double check, are you sure that the invocation of gitlab-runner above uses your compiled GitLab Runner from PATH, and not the one previously installed on the system?

You can also inspect ~/Library/LaunchAgents/gitlab-runner.plist to see if the first array element in ProgramArguments does indeed point to your compiled version of GitLab Runner.

gitperr commented 1 month ago

Yep, I checked that it is using the correct gitlab-runner (the one that I built myself). Confirmed from the plist file, as well as pipeline logs (it says development version HEAD):

Running with gitlab-runner development version (HEAD)
  on <runner group>, system ID: <system id>
Preparing the "custom" executor
00:00
Using Custom executor...
Pulling the latest version of <image name>..
tart command returned non-zero exit code: "Error: The Internet connection appears to be offline."

I tried this with both Go 1.22.8 and 1.23, to no avail.

Here are the exact steps I followed:

git clone -b v17.5.0 https://gitlab.com/gitlab-org/gitlab-runner.git
cd gitlab-runner/
go build -ldflags="-linkmode=external" -o gitlab-runner main.go
sudo mv gitlab-runner /usr/local/bin/
gitlab-runner --version
Version:      development version
Git revision: HEAD
Git branch:   HEAD
GO version:   go1.22.8
Built:        unknown
OS/Arch:      darwin/arm64
gitlab-runner install
gitlab-runner start
edigaryev commented 1 month ago
  1. Does the pull work if you specify an image from the internet, e.g. ghcr.io/cirruslabs/macos-sequoia-base:latest?
  2. Are you using any non-standard firewall/networking settings or software (e.g. VPN, endpoint security)?
gitperr commented 4 weeks ago
  1. It works, I tested with your example image.
  2. Not to my knowledge, no VPN or endpoint security.

Note: The registry I'm trying to pull from and the runner are on the same network, so they have connectivity normally.

waddles commented 3 weeks ago

We also hit this issue, but didn't stop upgrading until we'd broken our last runner 🥇 . After a few hours of playing around, we whittled it down to these instructions:

With that we managed to bring all 9 hosts back online.

@edigaryev do you know if there is an existing Gitlab ticket we can add some weight to to get them to build the gitlab-runner with the requisite flags?

edigaryev commented 3 weeks ago

@edigaryev do you know if there is an existing Gitlab ticket we can add some weight to to get them to build the gitlab-runner with the requisite flags?

A quick search for LC_UUID yields these two issues:

Another point to keep in mind that soon a new minor (or patch) version of Golang will be released, that will include the backported LC_UUID fix. This means that a new GitLab Runner release could get the fix automatically once it is built with the new Golang.

max-wittig commented 3 weeks ago

Same issue here. Waiting for the go fix is the only thing we can do: https://github.com/golang/go/issues/69992

Gitlab runner issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/38044

gitperr commented 3 weeks ago

You could run the gitlab runner with sudo, or try the above fix where you click the popup on the mac.

max-wittig commented 3 weeks ago

@gitperr I've tried that, but it doesn't seem to work. With sudo, the runner doesn't seem to be working at all. It says: "running", but it does not communicate to Gitlab for some reason.

gitperr commented 3 weeks ago

Hmm, make sure that it is using the correct config.toml and that it has a proper token there.

Did you try this when installing it? sudo gitlab-runner install --user <user to run with>

max-wittig commented 3 weeks ago

Yes, I tried that. I rebuild it according to https://github.com/cirruslabs/gitlab-tart-executor/issues/88#issuecomment-2444035719 and this works!

max-wittig commented 3 weeks ago

Update: Also this doesn't seem to work. It seems sporadic for some reason ☹️

Build like this:

GOOS=darwin GOARCH=amd64 CGO_ENABLED=1 go build -ldflags="-linkmode=external" -o gitlab-runner main.go
konlanx commented 2 weeks ago

I am experiencing the same problem.

The runner has not been changed since the last time this worked and now I am starting to experience the following situation:

Running with gitlab-runner 17.5.3 (xxx)
  on macos-runner xxx, system ID: xxx
Preparing the "custom" executor 03:31
Using Custom executor...
2024/11/04 15:25:04 Pulling the latest version of ghcr.io/cirruslabs/macos-sonoma-xcode:16...
2024/11/04 15:28:35 tart command returned non-zero exit code: "Error: The network connection was lost."
WARNING: Cleanup script failed: exit status 1
ERROR: Job failed: exit status 1

This seems to be a slightly different error message, where the connection was lost instead of not present at all, therefore I wanted to add it to this issue.

I verified that the host does have a functioning internet connection.

I tried restarting the host and renewing all brew installs, but it did not change the outcome.

edigaryev commented 2 hours ago

Thanks to @waddles, this is now fixed in the latest Homebrew version of GitLab Runner (>=17.6.0).

I've created https://github.com/cirruslabs/gitlab-tart-executor/pull/94 to reflect this in the README.md.