bottlerocket-os / bottlerocket-core-kit

A kit with core software packaged for Bottlerocket
Other
15 stars 23 forks source link

Flaky builds in AWS CodeBuild #138

Open jpculp opened 1 week ago

jpculp commented 1 week ago

Platform I'm building on:

AWS CodeBuild ARM_CONTAINER.

What I expected to happen:

Build the kit.

What actually happened:

error: failed to run custom build command for `microcode v0.1.0 (/codebuild/output/src710128580/src/bottlerocket-core-kit/packages/microcode)`
Caused by:
  process didn't exit successfully: `/codebuild/output/src710128580/src/bottlerocket-core-kit/target/x86_64/debug/build/microcode-16b683b5bfc6eb34/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-env-changed=BUILDSYS_ARCH
  cargo:rerun-if-env-changed=BUILDSYS_EXTERNAL_KITS_DIR
  cargo:rerun-if-env-changed=BUILDSYS_OUTPUT_GENERATION_ID
  cargo:rerun-if-env-changed=BUILDSYS_PACKAGES_DIR
  cargo:rerun-if-env-changed=BUILDSYS_ROOT_DIR
  cargo:rerun-if-env-changed=BUILDSYS_STATE_DIR
  cargo:rerun-if-env-changed=TLPRIVATE_SDK_IMAGE
  cargo:rerun-if-changed=Cargo.toml
  cargo:rerun-if-changed=/codebuild/output/src710128580/src/bottlerocket-core-kit/build/external-kits/external-kit-metadata.json
  cargo:rerun-if-changed=microcode.spec
  Error response from daemon: No such image: buildsys-pkg-microcode-x86_64-acca8ab8e27a:latest
  Error response from daemon: No such container: buildsys-pkg-microcode-x86_64-acca8ab8e27a-bypass

...

  ERROR: failed to solve: frontend grpc server closed unexpectedly

How to reproduce the problem:

Using a container with all the necessary build prerequisites, build the core kit. This seems to happen very frequently in all of the ARM_CONTAINER environments, but far less so on BUILD_GENERAL1_2XLARGE (x86_64). I'm not sure if the frequency is because of ARM, or if the 128 vCPUs of the x86_64 2XL somehow mitigates the issue.

  1. git clone https://github.com/bottlerocket-os/bottlerocket-core-kit
  2. cd bottlerocket-core-kit
  3. make
jpculp commented 4 days ago

When running XLARGE the frequency of failures were about the same between aarch64 and x86_64, so this might be related to a difference in the number of vCPUs.