Let's say that we launch a k8s stack with the following tags:
queue=cotopaxi
colour=chartreuse
lake=quilotoa
The way Buildkite normally works, and the way we would expect the k8s stack to work, is that the k8s stack would pick up any job that has agent query rules that cover a subset of these tags. That is, we might expect that a jobs with agent query rules:
queue=cotopaxi
lake=quilotoa
or
queue=cotopaxi
would be picked up by the k8s stack.
Unfortunately, there's a bug in the way we query for jobs where, essentially, agent query rules get ANDed together, and as a result, the k8s stack will only pick up jobs that have the complete set of agent query rules matching the agent's tags. That is, the agent mentioned above will only pick up jobs that have all of queue=cotopaxi, colour=chartreuse, lake=quilotoa. This is a bug.
To fix this, i've implemented the following fix:
instead of explicitly querying jobs that match the tags for our agent, we'll only query for the ones that match the queue for our instance
from here we'll decide if we're a valid instance to execute that job - eg, if the job has an agent query rule that excludes us, we'll ignore it. Previously, we were relying on the backend to do this for us, but this is where the query is broken for our use case
The k8s stack as it currently stands has a bug:
Let's say that we launch a k8s stack with the following tags:
The way Buildkite normally works, and the way we would expect the k8s stack to work, is that the k8s stack would pick up any job that has agent query rules that cover a subset of these tags. That is, we might expect that a jobs with agent query rules:
or
would be picked up by the k8s stack.
Unfortunately, there's a bug in the way we query for jobs where, essentially, agent query rules get ANDed together, and as a result, the k8s stack will only pick up jobs that have the complete set of agent query rules matching the agent's tags. That is, the agent mentioned above will only pick up jobs that have all of
queue=cotopaxi, colour=chartreuse, lake=quilotoa
. This is a bug.To fix this, i've implemented the following fix: