earthly / earthly

Super simple build framework with fast, repeatable builds and an instantly familiar syntax – like Dockerfile and Makefile had a baby.
https://earthly.dev
Mozilla Public License 2.0
11.34k stars 397 forks source link

Internal GET GIT META command fails with "exec: format error" #3826

Open brandonSc opened 7 months ago

brandonSc commented 7 months ago

What went wrong? Users are reporting occasional/non-deterministic errors in ARM satellites with the internal GIT META command. The command

earthly --org xxx --satellite xxx --strict --no-output --git-username='xxx' --git-password='***' github.com/xxx:refs/heads/dev+lint

fails with a message like the following:

Init 🚀
————————————————————————————————————————————————————————————————————————————————

           satellite | xxx is waking up. Please wait...
           satellite | ...reporting thermal conditions...
           satellite | ...contacting mission control...
           satellite | ...reticulating splines...
           satellite | ...aligning solar panels...
           satellite | ...System online.
           satellite | Connecting to xxx...
           satellite | ...Done
           satellite | Version github.com/earthly/buildkit v0.7.23 086f60eecf5a2261bbd53299614f88b3def12746
           satellite | Platforms: linux/arm64 (native) linux/arm/v7 linux/arm/v6
           satellite | Utilization: 0 other builds, 0/48 op load
           satellite | GC stats: 68 GB cache, avg GC duration 0s, all-time GC duration 0s, last GC duration 0s, last cleared 0 B

Streaming logs to https://cloud.earthly.dev/builds/xxx

 Build 🔧
————————————————————————————————————————————————————————————————————————————————

================================== ❌ FAILURE ===================================

  ***xxx#r *failed* | Repeating the failure error...
  ***xxx#r *failed* | exec /bin/sh: exec format error
  ***xxx#r *failed* | ERROR
  ***xxx#r *failed* |       The internal command
  ***xxx#r *failed* |           GET GIT META github.com/xxx:refs/heads/dev
  ***xxx#r *failed* |       did not complete successfully. Exit code 1
View logs at https://cloud.earthly.dev/builds/xxx
Error: Process completed with exit code 1.

Since the git command is run in a pulled git image, it's possible that this exec format error is due to an incorrect image platform being used. In many cases, this error has been reported on ARM64-based satellites (where an amd64-based image may be getting pulled incorrectly).

Note that this internal command is used when resolving a remote target references (for example, when running earthly github.com/org/repo+some-target).

More insights reported by users:

brandonSc commented 7 months ago

Since we suspect that the wrong image platform is being used, we have added the following to our error message: https://github.com/earthly/earthly/pull/3805

This will help determine if an incorrect platform is being set in the image pull parameters.

Note that running with --verbose may be required to see the full error string.

idelvall commented 7 months ago

BTW, this trace will be shipped in a new release that is coming today or tomorrow

alexcb commented 7 months ago

from https://github.com/earthly/earthly/pull/3805#issuecomment-1957765574

This error may happen when the git image is overridden in the Earthly config.

brandonSc commented 6 months ago

We figured out that this issue can be the result of running a build on a satellite before QEMU has started, and when the git image used requires emulation. A backend fix will be implemented to ensure QEMU is running before allowing the build. We can leave this ticket open until then.