Closed toothbrush closed 3 years ago
I've just double-checked using a small repro:
steps:
- name: us-east-1
command: env
env:
TEST_FOO: bar
AWS_DEFAULT_REGION: us-east-1
- name: us-west-2
command: env
env:
TEST_FOO: baz
AWS_DEFAULT_REGION: us-west-2
Concentrating on the us-west-2
step, if i look at the "Environment" tab of the step, it looks as expected:
But if i look at the output of the steps, env
disagrees:
[2021-08-18T02:56:04Z] $ trap 'kill -- $' INT TERM QUIT; env
[2021-08-18T02:56:04Z] BUILDKITE_CONFIG_PATH=/etc/buildkite-agent/buildkite-agent.cfg
[2021-08-18T02:56:04Z] AWS_REGION=us-east-1
[2021-08-18T02:56:04Z] AWS_DEFAULT_REGION=us-east-1
...
That doesn't seem right to me?
Hi @toothbrush!
The part you have linked to only sets environment variables for the bk-install-elastic-stack.sh
script run as part of the EC2 Instance’s UserData
. However, those environment variables do get saved and subsequently sourced by an Agent environment hook so actually will be set in your job environment by the hook and overwrite the one set by your job env 🤔
Does it seem reasonable to tack an AWS_DEFAULT_REGION=foo
on the end of your make
commands which will be set as an environment variable for make-spawned subcommands?
keithduncan@Keiths-Mac-mini make-test % make target | grep 'AWS_*'
keithduncan@Keiths-Mac-mini make-test % make target AWS_DEFAULT_REGION=us-east-1 | grep 'AWS_*'
declare -x AWS_DEFAULT_REGION="us-east-1"
declare -x MAKEFLAGS="AWS_DEFAULT_REGION=us-east-1"
Yeah – that's one way of working around it. I just made a patch to one of our repos to set DEPLOY_REGION=foo
and in the Makefile re-export export AWS_DEFAULT_REGION=${DEPLOY_REGION}
if set. This way the workaround is obvious.
My concern with an approach like you suggest is that the difference between the following isn't obvious at all:
steps:
- name: approach 1
command: make something
env:
AWS_DEFAULT_REGION: us-east-1
- name: approach 2
command: make something AWS_DEFAULT_REGION=us-west-2
..and someone is liable to come along later and not realise there's a big footgun lurking there.
Wouldn't it be preferable to patch this bit:
Into something more like (pseudo-code, but i mean to say, use whatever AWS_DEFAULT_REGION
is set, and only set it to the build agent's region if it's not set)?
export AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION:-${REGION}}
Actually, it's true that you probably want the agent to have a correct AWS_DEFAULT_REGION
corresponding to where they "live", whereas the job should probably respect what's in the pipeline.yml
? That makes my simplistic suggestion incorrect.
I still feel like this behaviour is rather unexpected.
I recall running back into this problem years back. I'd planned to solve it by scoping those exports in the environment hook to something like BUILDKITE_STACK_AWS_REGION
and then ensuring they are set correctly in the embedded plugins.
Alternately, perhaps the stack plugins should query the EC2 meta-data service directly.
My concern about changing or removing these now is how much work it will generate for teams who rely on the environment variable being set today 🤔
We could make the built-in environment hook smarter, keeping these by default but not clobbering them if already set by the job env.
Putting something like this in the cfn-env
instead:
# Escape, don't interpolate this when writing the cfn-env, check the environment process’ environment for this var
if [ -z "\${AWS_REGION:-}" ]
then
# Do interpolate, pull in the bash process' AWS_REGION when writing the cfn-env
export AWS_REGION=$AWS_REGION
fi
# Escape, don't interpolate when writing the cfn-env, check the environment process’ environment for this var
if [ -z "\${AWS_DEFAULT_REGION:-}" ]
then
# Do interpolate, pull in the bash process' AWS_REGION when writing the cfn-env
export AWS_DEFAULT_REGION="$AWS_REGION"
fi
Does it matter that these variables could shear? Should we instead check that neither are set by the time the environment
hook is running and if true, set both to the stack env to mimic the prior behaviour?
We could make the built-in environment hook smarter, keeping these by default but not clobbering them if already set by the job env.
Yeah, this was my awkwardly-worded suggestion above indeed. However on Slack @lox suggested some plugins may rely on AWS_DEFAULT_REGION
being equal to the region of the agent – i would humbly suggest that may be a bug if that's true?
Does it matter that these variables could shear? Should we instead check that neither are set by the time the
environment
hook is running and if true, set both to the stack env to mimic the prior behaviour?
I think i personally care less about the exact behaviour in edge-cases and much more about just being given big honking warnings in anything other than the happy path. For example, setting a variable in the pipeline but having it clobbered would ideally say something scary has happened.
Yeah, this was my awkwardly-worded suggestion above indeed. However on Slack @lox suggested some plugins may rely on
AWS_DEFAULT_REGION
being equal to the region of the agent – i would humbly suggest that may be a bug if that's true?
I think that’s okay from my perspective, any breakage will be confined to cases such as yours (sorry! 😅) where you are deliberately overriding these environment variables for the entire job rather than a piecewise aspect of it as with my make
suggestion above.
If we adopt this I’m prepared to follow through with support for an explicit region
parameter in any Buildkite plug-ins that need it to ensure robust multi-region support throughout. Though I can’t speak for other plug-in providers 😄
What do you think about preparing a branch with this change and if possible testing it on one of your deployed stacks (branches are only built for us-east-1, I hope that works in this case)? If it works for what you’re doing I’d be prepared to merge it :bow:
Cool, i appreciate the response. That all sounds very reasonable to me. I should be able to get to that within a reasonable timeframe (some number of weeks hopefully smaller than, say, 5).
We're in luck, all of our Buildkite fleet runs in us-east-1
!
Hey folks, we've just adopted this Elastic CI stack instead of our legacy/hand-built one, and it mostly is working well!
We ran into a surprising issue though, when attempting to do a cross-region deploy. The situation is that we rely on
AWS_DEFAULT_REGION
to point deploy commands at the right target (that's probably not the best, but bear with me, legacy decisions that are now in many repos all over the place 🤷).One of our pipelines looks roughly as follows – commands such as
aws
inside the Makefile rely onAWS_DEFAULT_REGION
pointing where we want to deploy things:However, when the steps run i see that in both cases,
AWS_DEFAULT_REGION == us-east-1
– that's where our Buildkite agents run.I suspect this code is running and somehow overriding the
env:
block of ourpipeline.yml
, but there must surely be a way to allow overriding some of these variables?https://github.com/buildkite/elastic-ci-stack-for-aws/blob/368a03a14b51a44a87af733a96cc193c7b10a3eb/templates/aws-stack.yml#L981-L985
I've currently made a workaround for one of our repositories, but this feels like quite a footgun – if i set a value in the pipeline's
env:
block, i expect it to be respected in the executed steps.cc: @novatan-rb @jonhiggs