EmbarkStudios / k8s-buildkite-plugin

Run any buildkite build step as a Kubernetes Job
https://embark.dev
Apache License 2.0
49 stars 19 forks source link

`k8s-buildkite-agent` built on CI has missing executables #56

Closed loeffel-io closed 1 year ago

loeffel-io commented 1 year ago

Hello,

i try to configure the buildkite helm chart. It creates those two base64 secrets (https://github.com/buildkite/charts/blob/master/stable/agent/templates/secret.yaml#L12):

kubectl describe secret buildkite-agent  -n buildkite

Name:         buildkite-agent
Namespace:    buildkite
Labels:       app=agent
              app.kubernetes.io/managed-by=Helm
              chart=agent-0.6.1
              heritage=Helm
              release=buildkite
Annotations:  meta.helm.sh/release-name: buildkite
              meta.helm.sh/release-namespace: buildkite

Type:  Opaque

Data
====
agent-ssh:    80 bytes
agent-token:  50 bytes

How to configure the Plugin now? I am pretty confused about all those different options like secret-name and default-secret-name

This is my current configuration

      - plugins:
          - EmbarkStudios/k8s:
              image: "gcr.io/bazel-public/bazel:6.0.0"
              entrypoint: ""
              shell: [ "sh", "-e", "-c" ]
              service-account-name: "global-base-production"
              secret-name: "buildkite-agent"
              agent-token-secret-key: "agent-token"
              git-ssh-secret-key: "agent-ssh"

Would really like to get some quick help 🙏 ❤️

loeffel-io commented 1 year ago

My fault, everything works as expected 👍

tgolsson commented 1 year ago

Hey @loeffel-io! Glad you got it to work :-) Was there anything we could improve in the docs that would've helped? What was the issue?

loeffel-io commented 1 year ago

@tgolsson

It was just the wrong ssh key ..

Right now i struggle with the docker image - you require jsonnet and base32 as deps, but there is no working buildkite-agent docker image which provides those deps. The k8s-buildkite-agent image fails with:

{
  "textPayload": "/entrypoint.sh: line 15: /usr/local/bin/buildkite-agent: No such file or directory",
  "insertId": "5zwu1zjp9oi4q4kz",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "namespace_name": "buildkite",
      "location": "us-central1",
      "cluster_name": "buildkite-gke-production",
      "pod_name": "buildkite-agent-845ff66dbc-h9pmm",
      "container_name": "agent",
      "project_id": "buildkite-374309"
    }
  },
  "timestamp": "2023-01-31T10:20:17.669023321Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/release": "buildkite",
    "k8s-pod/pod-template-hash": "845ff66dbc",
    "k8s-pod/app": "agent",
    "compute.googleapis.com/resource_name": "gke-buildkite-gke-pr-buildkite-gke-no-de5b500c-qd9d"
  },
  "logName": "projects/buildkite-374309/logs/stderr",
  "receiveTimestamp": "2023-01-31T10:20:18.868420197Z"
}

So i think the initial hurdle is way to big - i don't want to maintain my own buildkite agent docker image.

loeffel-io commented 1 year ago

After some research - i have no glue where this is running:

~~~ Preparing plugins
# Plugin "github.com/EmbarkStudios/k8s-buildkite-plugin" already checked out (0e13cac)
~~~ Preparing working directory
$ cd /buildkite/builds/buildkite-agent-54b457fdd7-7k7rz-1/mindful/global-base
# Host "github.com" already in list of known hosts at "/root/.ssh/known_hosts"
$ git remote set-url origin git@github.com:mindful-hq/global-base.git
$ git clean -ffxdq
$ git fetch -v --prune -- origin 56aba901dfe4973ddad928a3e4910a0df572c814
From github.com:mindful-hq/global-base

 * branch            56aba901dfe4973ddad928a3e4910a0df572c814 -> FETCH_HEAD

$ git checkout -f 56aba901dfe4973ddad928a3e4910a0df572c814
HEAD is now at 56aba90 test: buildkite

# Cleaning again to catch any post-checkout changes
$ git clean -ffxdq
# Checking to see if Git data needs to be sent to Buildkite
$ buildkite-agent meta-data exists buildkite:git:commit
~~~ Running plugin k8s command hook
$ /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/hooks/command
/buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/hooks/command: line 16: base32: command not found

--- :kubernetes: Starting Kubernetes Job

/buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/hooks/command: line 93: jsonnet: command not found

🚨 Error: The command exited with status 127
^^^ +++
^^^ +++
~~~ Running plugin k8s pre-exit hook
$ /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/hooks/pre-exit
--- :kubernetes: Cleanup

$ cd /buildkite/builds/buildkite-agent-54b457fdd7-7k7rz-1/mindful/global-base

is it running in my buildkite-agent? is it running in the init image (https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/master/lib/job.jsonnet#L31)? is it running in my step image?

I can't find any information about that

loeffel-io commented 1 year ago

looks like this belongs to the issue https://github.com/EmbarkStudios/k8s-buildkite-plugin/issues/10 and since i am using the buildkite chart this is running in my buildkite-agent image

tgolsson commented 1 year ago

So yeah, there's three phases of the job running.

As you found in the other issue; yeah; you need a modified docker image - the base Buildkite one doesn't have jsonnet or other tools we need. I'm not sure if anything has changed since that issue or if the base image we publish would work now. The ones we run internally have a lot more tools for things that run without the plugin, e.g. C++ compilers, etc. I'd try overriding the image in the chart with the one from here: https://hub.docker.com/r/embarkstudios/k8s-buildkite-agent.

loeffel-io commented 1 year ago

Thank you @tgolsson 🙏

The job then runs the init container (which is also buildkite agent) to set up the general build workspace as a regular Buildkite agent would. https://buildkite.com/docs/agent/v3/cli-bootstrap

I am pretty sure that this is running by the https://hub.docker.com/r/embarkstudios/k8s-buildkite-agent here https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/master/lib/job.jsonnet#L31 isn't it?

I'd try overriding the image in the chart with the one from here: https://hub.docker.com/r/embarkstudios/k8s-buildkite-agent

I already tried that for your point 1 which generates those errors mentioned above "textPayload": "/entrypoint.sh: line 15: /usr/local/bin/buildkite-agent: No such file or directory"

I still wondering which of those steps produces the above error message line 93: jsonnet: command not found?

Thank you very much 🙏

tgolsson commented 1 year ago

I am pretty sure that this is running by the https://hub.docker.com/r/embarkstudios/k8s-buildkite-agent here https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/master/lib/job.jsonnet#L31 isn't it?

Yepp! It runs buildkite-agent inside that container.

I already tried that for your point 1 which generates those errors mentioned above "textPayload": "/entrypoint.sh: line 15: /usr/local/bin/buildkite-agent: No such file or directory"

This seems like a build bug - the published image is incomplete 😱. If I build it locally it does have buildkite-agent in there. If you run the much older 1.2.0 image that one has the file as well (it might be broken by age now though - API versions etc). Will investigate!

https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/0e13cac380ab4a5eb0736bf1b67866f2588146c9/Dockerfile#L14

I still wondering which of those steps produces the above error message line 93: jsonnet: command not found?

The jsonnet library is used during the first step, when we generate the job.

loeffel-io commented 1 year ago

Amazing, thank you very much for those informations 🙏

tgolsson commented 1 year ago

@loeffel-io I've pushed a new latest image, feel free to try that. I've validated that it has the correct binaries.

sha256:1d88791315ed6b0b49a64055bc71c5a9a0b1953e387f99d25299ed06ccea5dbd is the SHA for the fixed one.

loeffel-io commented 1 year ago

@tgolsson great, thanks!

I also bumped the k8s init image: https://github.com/EmbarkStudios/k8s-buildkite-plugin/pull/58

tgolsson commented 1 year ago

Thanks, and new release done!

loeffel-io commented 1 year ago

Great work @tgolsson!

One last thing: shouldn't we may bump the versions in the dockerfile? https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/master/Dockerfile

The buildkite agent version itself is 2 years old: https://hub.docker.com/layers/buildkite/agent/3.29.0/images/sha256-5c7d788323b084affed6ee2d6a73e8cff9ff2714af327648ae7c8c99aba32487?context=explore

loeffel-io commented 1 year ago

⚠️

the image is not working:

{
  "textPayload": "Use \"buildkite-agent <command> --help\" for more information about a command.",
  "insertId": "7c7p1tfaixqptibl",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "pod_name": "buildkite-agent-845ff66dbc-d86rt",
      "container_name": "agent",
      "location": "us-central1",
      "namespace_name": "buildkite",
      "project_id": "buildkite-374309",
      "cluster_name": "buildkite-gke-production"
    }
  },
  "timestamp": "2023-02-01T09:52:29.997493522Z",
  "severity": "INFO",
  "labels": {
    "k8s-pod/pod-template-hash": "845ff66dbc",
    "compute.googleapis.com/resource_name": "gke-buildkite-gke-pr-buildkite-gke-no-de5b500c-o6ww",
    "k8s-pod/release": "buildkite",
    "k8s-pod/app": "agent"
  },
  "logName": "projects/buildkite-374309/logs/stdout",
  "receiveTimestamp": "2023-02-01T09:52:32.304224852Z"
}
tgolsson commented 1 year ago

I'm generally hesitant to bump for the sake of bumping - it leads to churn and potential disruption. But yeah, maybe 2 years old is a bit old... I'm just worried about breaking changes then. If you want to PR a bump (maybe for all tools?) we can see how much has changed.

loeffel-io commented 1 year ago

downloaded-logs-20230201-105600.json.zip

tgolsson commented 1 year ago

Hmm, odd. Weird that it doesn't say what it fails to do. Is this during setup, node-boot, ..?

loeffel-io commented 1 year ago

This happens when i want to start the buildkite helm chart with the new image - which was the standard buildkite/agent image before

tgolsson commented 1 year ago

Right! So I think that happens because we override the entrypoint in the init-container image, and the helm chart relies on whatever is baked into the buildkite-agent image.

tgolsson commented 1 year ago

OK; it looks like there's a special entrypoint that needs to run too when bootstrapping the node. I think maybe it'd make sense for this project to publish a base image that could work for the node too. I've got quite a bunch of things to do today, but the current Dockerfile is quite close to what's needed... just need to not build it into an alpine base.

loeffel-io commented 1 year ago

great, shouldn't it be easy to just add the jsonnet binary to the original buildkite/agent:3.x-alpine-k8s image (includes kubectl)? or is there more to do?

tgolsson commented 1 year ago

That does sound about right. There's a bunch of installs in the base one, some of them may be needed for jsonnet, maybe.

loeffel-io commented 1 year ago

I'll give that a try

loeffel-io commented 1 year ago

update: the current error message for running the self made image

RUNTIME ERROR: Field does not exist: BUILDKITE_BUILD_CREATOR_TEAMS
--
  | /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/lib/job.jsonnet:117:28-61  object <anonymous>
  | Field "build/creator-teams"
  | Field "annotations"
  | Field "metadata"
  | During manifestation

could belong to

/buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/hooks/command
--
  | /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin/hooks/command: line 16: base32: command not found
loeffel-io commented 1 year ago

update: base32 is fixed

i have no glue how to fix the BUILDKITE_BUILD_CREATOR_TEAMS error - would love to get some help with that

tgolsson commented 1 year ago

That should be set when setting up a job.

https://buildkite.com/docs/pipelines/environment-variables#BUILDKITE_BUILD_CREATOR_TEAMS. I believe you should see that in the step environment information. I'm not sure if that might be missing if you have no teams - for my latest build on Buildkite it lists the three teams I'm in there. Can you see it as well? We might need to guard that setting in case the triggering user has no teams.

loeffel-io commented 1 year ago

Nope, its not there

CI="true"
BUILDKITE="true"
BUILDKITE_ORGANIZATION_SLUG="mindful"
BUILDKITE_PIPELINE_SLUG="global-base"
BUILDKITE_PIPELINE_NAME="global-base"
BUILDKITE_PIPELINE_ID="018606c8-d6d0-472c-a946-232e5160058f"
BUILDKITE_PIPELINE_PROVIDER="github"
BUILDKITE_PIPELINE_DEFAULT_BRANCH="master"
BUILDKITE_REPO="git@github.com:mindful-hq/global-base.git"
BUILDKITE_BUILD_ID="01860eae-c4c4-457a-a880-48edaae60705"
BUILDKITE_BUILD_NUMBER="27"
BUILDKITE_BUILD_URL="https://buildkite.com/mindful/global-base/builds/27"
BUILDKITE_BRANCH="main"
BUILDKITE_TAG=""
BUILDKITE_COMMIT="52db6228b519667d1185f581cc14b8e29e164fe9"
BUILDKITE_MESSAGE="test: buildkite"
BUILDKITE_SOURCE="webhook"
BUILDKITE_BUILD_AUTHOR="Lucas Löffel"
BUILDKITE_BUILD_AUTHOR_EMAIL="lucas@loeffel.io"
BUILDKITE_BUILD_CREATOR="Lucas Löffel"
BUILDKITE_BUILD_CREATOR_EMAIL="lucas@loeffel.io"
BUILDKITE_REBUILT_FROM_BUILD_ID=""
BUILDKITE_REBUILT_FROM_BUILD_NUMBER=""
BUILDKITE_PULL_REQUEST="false"
BUILDKITE_PULL_REQUEST_BASE_BRANCH=""
BUILDKITE_PULL_REQUEST_REPO=""
BUILDKITE_TRIGGERED_FROM_BUILD_ID=""
BUILDKITE_TRIGGERED_FROM_BUILD_NUMBER=""
BUILDKITE_TRIGGERED_FROM_BUILD_PIPELINE_SLUG=""
BUILDKITE_JOB_ID="01860eb0-6d03-4617-96bb-444d4a961f87"
BUILDKITE_LABEL="global"
BUILDKITE_COMMAND="bazel test --remote_cache= --google_credentials= //...
bazel build --remote_cache= --google_credentials= //..."
BUILDKITE_ARTIFACT_PATHS=""
BUILDKITE_RETRY_COUNT="0"
BUILDKITE_TIMEOUT="false"
BUILDKITE_STEP_KEY=""
BUILDKITE_STEP_ID="01860eb0-682f-4c03-8bcb-3a8e00bf880e"
BUILDKITE_PROJECT_SLUG="mindful/global-base"
BUILDKITE_PROJECT_PROVIDER="github"
BUILDKITE_SCRIPT_PATH="bazel test --remote_cache= --google_credentials= //...
bazel build --remote_cache= --google_credentials= //..."
BUILDKITE_AGENT_ID="01860ea8-eab9-42f4-9814-acfa820bbf69"
BUILDKITE_AGENT_NAME="buildkite-agent-5cd5ffd9cf-trbgl-1"
BUILDKITE_AGENT_META_DATA_QUEUE="default"
BUILDKITE_AGENT_META_DATA_ROLE="agent"
BUILDKITE_REPO_SSH_HOST="github.com"
BUILDKITE_PLUGINS="[{\"github.com/EmbarkStudios/k8s-buildkite-plugin#v1.2.15\":{\"image\":\"gcr.io/bazel-public/bazel:6.0.0\",\"shell\":[\"sh\",\"-e\",\"-c\"],\"entrypoint\":\"\",\"secret-name\":\"buildkite-agent\",\"git-ssh-secret-key\":\"agent-ssh\",\"service-account-name\":\"global-base-production\",\"agent-token-secret-key\":\"agent-token\"}}]"

there are no teams yet btw i really need to get that done - could it be possible for you to fix that soon? thank you so much for the information, helped me a lot to understand the issue for now i just created a team and i'll create a bug ticket for this

tgolsson commented 1 year ago

Interesting. It shouldn't be too hard to fix, will take a peek tomorrow.

loeffel-io commented 1 year ago

amazing @tgolsson! 🙏

loeffel-io commented 1 year ago

i think i never had such a bad experience to setup a plugin

i now got this error message

/buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin-v1-3-0/hooks/command: line 169: BUILDKITE_PLUGIN_K8S_INIT_IMAGE: unbound variable

do you have any idea @tgolsson

tgolsson commented 1 year ago

I'm sorry you feel that way - I noticed when I took over (as maintainer) that there hasn't been a full release of the actual plugin since 2021, so likely a bunch of code-rot has happened since then and some unpublished changes that likely break things, as with that error.

I've pushed a guard clause for that to the same branch you used before - new commit 44b05b2ef952c75809f7603e1b8607f57ac194ea.

loeffel-io commented 1 year ago

With 44b05b2 i get (think that commit did not help)

# Cleaning again to catch any post-checkout changes
$ git clean -ffxdq
# Checking to see if Git data needs to be sent to Buildkite
$ buildkite-agent meta-data exists buildkite:git:commit
~~~ Running plugin k8s command hook
$ /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin-44b05b2ef952c75809f7603e1b8607f57ac194ea/hooks/command
--- :kubernetes: Starting Kubernetes Job

job.batch/global-base-38-3kyz24ij created

Timeout: 36000s

--- :kubernetes: Running image: gcr.io/bazel-public/bazel:6.0.0

Pod is running: global-base-38-3kyz24ij-sgxvb

+++ :kubernetes: step container

--- :kubernetes: Job status: Failed

Warning: init container failed with exit code 1, this usually indicates plugin misconfiguration or infrastructure failure

🚨 Error: The command exited with status 1
^^^ +++
^^^ +++
user command error: The plugin k8s command hook exited with status 1
~~~ Running plugin k8s pre-exit hook
$ /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin-44b05b2ef952c75809f7603e1b8607f57ac194ea/hooks/pre-exit
--- :kubernetes: Cleanup

With init-image: "embarkstudios/k8s-buildkite-agent@sha256:3c010d09915f3b39c2f8324af5f0aaf910a643e7d63607ee8d49653931b8b167" i get this and then it get stucked endless on boostrap container

# Cleaning again to catch any post-checkout changes
$ git clean -ffxdq
# Checking to see if Git data needs to be sent to Buildkite
$ buildkite-agent meta-data exists buildkite:git:commit
~~~ Running plugin k8s command hook
$ /buildkite/plugins/github-com-EmbarkStudios-k8s-buildkite-plugin-v1-3-0/hooks/command
--- :kubernetes: Starting Kubernetes Job

job.batch/global-base-39-u2mdyyfq created

Timeout: 36000s

--- :kubernetes: Running image: gcr.io/bazel-public/bazel:6.0.0

Pod is running: global-base-39-u2mdyyfq-ncq86

--- :kubernetes: bootstrap container

so setting init-image looks promising rn but why it gets stucked at kubernetes: bootstrap container? Maybe because my buildkite-agent image is not running your https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/1.2.15/entrypoint.sh file?

loeffel-io commented 1 year ago

update:

i checked the gke logs and the container get stuck with those logs:

downloaded-logs-20230203-115521.json.zip

tgolsson commented 1 year ago

So that's progress! I'm not sure why the init container would fail if not running an init image, that sounds like a bug and a good case for a self-test. I'll see if I can whip that up after lunch.

The ssh-key thing in the log sounds like a configuration error - I believe that can happen if you have newline issues at the end of the key. Either missing or one too many... (Edit: After some googling it looks like it's a missing newline at the end most commonly because a lot of tools trim that.)

loeffel-io commented 1 year ago

important question i think: does it require the private or public key at this stage?

tgolsson commented 1 year ago

That should be the private key to match the public one you've given to GitHub.

loeffel-io commented 1 year ago

because it's the private key which works great one step earlier

Bildschirm­foto 2023-02-03 um 12 05 11
loeffel-io commented 1 year ago

tried it trimmed and with newline

loeffel-io commented 1 year ago

maybe important: the private key is a kubernetes base64 encoded secret

tgolsson commented 1 year ago

That should be fine. Can you decode the key and validate that the newline is actually there? I know some tools might strip whitespace while encoding the key, especially if it's passed on the command line.

loeffel-io commented 1 year ago

The key is (value from gcloud secret)

-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAMwAAAAtzc2gtZW
QyNTUxOQAAACAr6Vxxx...
-----END OPENSSH PRIVATE KEY-----

just some thoughts: because the buildkite agent version is so old - maybe it want a RSA PRIVATE KEY or something?

update:

tested it with a new legacy-system key: https://docs.github.com/de/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent#einen-neuen-ssh-schl%C3%BCssel-erzeugen

same result

update:

tested it with $ ssh-keygen -t ed25519 -C "your_email@example.com"

same result

tgolsson commented 1 year ago

yeah; that looks good, but does it have \n at the back or not? :P Could also be wrong line endings, so having CRLF instead of LF for example. But yeah, there have been some deprecations in openssh, that seem to lead to this. So either an older openssh or a newer key might work.

Also - confusingly - it does work in the main container - right? So I'm going to guess this is related to ssh-agent, which is how it's set up in the entrypoint. Is the main buildkite-agent using the same ssh-key mount?

In the same vein (just to rule out copy-paste errors etc): are you looking at the right secret? The /secrets/ssh-key can both be picked from the default secrets, or from the git-ssh-secret-key config etc. Might be worth checking the job spec to see what is actually being mounted. I notice you specced it as agent-ssh in the first post, for example - does that have the right ssh-key sub-item?

loeffel-io commented 1 year ago

yeah; that looks good, but does it have \n at the back or not? :P Could also be wrong line endings, so having CRLF instead of LF for example. But yeah, there have been some deprecations in openssh, that seem to lead to this. So either an older openssh or a newer key might work.

how would you check that?

Also - confusingly - it does work in the main container - right? So I'm going to guess this is related to ssh-agent, which is how it's set up in the entrypoint. Is the main buildkite-agent using the same ssh-key mount?

yes

In the same vein (just to rule out copy-paste errors etc): are you looking at the right secret? The /secrets/ssh-key can both be picked from the default secrets, or from the git-ssh-secret-key config etc. Might be worth checking the job spec to see what is actually being mounted. I notice you specced it as agent-ssh in the first post, for example - does that have the right ssh-key sub-item?

this is my pipeline.yml

steps:
  - group: "Global"
    key: "global"
    steps:
      - plugins:
          - EmbarkStudios/k8s#v1.3.0:
              image: "gcr.io/bazel-public/bazel:6.0.0"
              entrypoint: ""
              shell: [ "sh", "-e", "-c" ]
              service-account-name: "global-base-production"
              secret-name: "buildkite-agent"
              agent-token-secret-key: "agent-token"
              git-ssh-secret-key: "agent-ssh"
              init-image: "embarkstudios/k8s-buildkite-agent@sha256:3c010d09915f3b39c2f8324af5f0aaf910a643e7d63607ee8d49653931b8b167"
        label: "global"
        command:
          - bazel test --remote_cache=$GOOGLE_BUCKET_PRODUCTION --google_credentials=$GOOGLE_CREDENTIALS_PRODUCTION //...
          - bazel build --remote_cache=$GOOGLE_BUCKET_PRODUCTION --google_credentials=$GOOGLE_CREDENTIALS_PRODUCTION //...

if i change git-ssh-secret-key to "agent-ssh-test" it fails with something like secret not found

running out of energy for this ..

loeffel-io commented 1 year ago

update: i created my own init image to modify the versions and check the ssh key from the /secrets/ssh-key file - everything looks good - the key is there in plain text and ssh-add -k /secrets/ssh-key still throws Error loading key "/secrets/ssh-key": invalid format

tgolsson commented 1 year ago

I'm looking at reproing on my branch, and it looks like our variant of this passes - I've fixed the bug with init-image config there, but can't repro the SSH. Can you try cat -e /secrets/ssh-key? And ensure each line including last has only $, not ^M$

loeffel-io commented 1 year ago

All lines of the logs have $ at the end - but this one looks weird!

  {
    "textPayload": "-----END OPENSSH PRIVATE KEY-----Agent pid 10",
    "insertId": "xasuvsciy5nzspe8",
    "resource": {
      "type": "k8s_container",
      "labels": {
        "pod_name": "global-base-55-zplhv5zh-nn5hb",
        "project_id": "buildkite-374309",
        "location": "us-central1",
        "container_name": "bootstrap",
        "namespace_name": "buildkite",
        "cluster_name": "buildkite-gke-production"
      }
    },
tgolsson commented 1 year ago

That looks like there's no trailing newline so two lines get merged when catting it.

loeffel-io commented 1 year ago
Bildschirm­foto 2023-02-03 um 14 06 46

I've added a newline (?) now to my google secret manager secret. The thing is, that this secret gets downloaded at my script through gcloud and gets to terraform through a input. long story short: after adding the newline (?) terraform do not recognize any changes - so i think it will get trimmed here

need to check that after lunch

tgolsson commented 1 year ago

@loeffel-io FWIW I did a dig in and found a few bugs/edge-cases in how we create jobs. I get a passing run in our env with - EmbarkStudios/k8s#6b36fe4f6b770cdb97fd420b50cc94cc1c0bcbce: as the plugin spec.

This is the full config we have on that branch:

https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/7a578500069ad8a9aa494ecf57da4e457a90cfad/.buildkite/pipeline.yaml#L6-L13

loeffel-io commented 1 year ago

amazing! 🙏