This is done on purpose, for security reasons, and because we don't want to leak/overwrite unwanted/unexpected env vars (SHELL, USER, …) from the host into the VM. So we should definitively keep such a filtering (as opposed to export all existing env vars from the host blindly).
The issue
That being said, sometimes it'd be useful to have some specific env vars being transferred to the VM so that they can be resolved when they are used in the .buildkite/commands/*.sh scripts we call from the command: attribute of our pipelines.
One example of that is env vars we use in our ReleaseV2 scenario to pass values like RELEASE_VERSION to the pipeline.
Currently, if those env vars are referenced by the command: attribute of the .yml pipeline directly there's no issue, because those env vars are then resolved when buildkite-agent pipeline upload parses the pipeline and interpolate those values at that time, and by the time the command to run is passed over to the VM via hostmgr generate buildkite-job, that value has already been resolved, so the VM will receive the value of the env var, not the reference to it
But if those env vars are referenced in a .buildkite/command/*.sh that is called by the command: attribute of the .yml (in other words, we have an additional level of indirection), then that env var will only be evaluated when the VM will run the BUILDKITE_COMMAND that tells it to call that .buildkite/command/*.sh script, but the env var will not be available in the VM itself.
Proposed Solution
It could be worth checking if there's a way to access the list of env vars listed in the env: attribute of the step being run by the YAML pipeline (+ the ones declared in env: at the root of the pipeline and that apply to all steps). If so, we could make hostmgr also export those env vars in the hostmgr generate buildkite-job script passed by SSH to the VM, allowing us to reference those from within our .buildkite/commands/*.sh scripts.
The nice thing with that approach is that we'd still keep the security aspect of not transferring all env vars blindly but only the ones explicitly declared for that step, using the env attribute as an allowlist of env vars to transfer to the VM for that job.
Potential technical solution
I found multiple ways to get the list of env vars known to a job:
~Using buildkite-agent env dump~ — nevermind, this dumps the env of the buildkite-agent process itself (including HOME, USER, etc…) not the env of the job.
All those contain more env vars than just the ones declared on the env: attribute of the step. In particular it seems to also contain:
env: vars passed when calling the API to trigger a new build (in particular: the PIPELINE= we pass when we want to trigger a different pipeline than the default one via API call)
env: vars provided at the pipeline root level (like we often do for IMAGE_ID, instead of repeating that one on each step)
env: vars provided at the step level
BUILDKITE_* env vars declared by Buildkite itself
But it should be easy to make GenerateBuildkiteJobScript filter out the BUILDKITE_* ones from that list and only call addEnvironmentVariable(name:,value:) for the remaining ones[^2].
[^2]: One might think that we could also just keep the BUILDKITE_* ones from that list, and remove the call to copyEnvironementVariables(prefixedBy:) in our script instead. But that would not be equivalent, because copyEnvironmentVariablesis based on the list of env vars from ProcessInfo.processInfo.environment, which includes additional BUILDKITE_* env vars that are exposed to the agent itself (e.g. BUILDKITE_AGENT_ACCESS_TOKEN, etc) not just the ones exposed to the job
See also: D161538#3028347-code
Current state
Currently the bash script that is generated by
hostmgr generate buildkite-job
then passed to the VM via SSH to kick off the build within the VM only exports env vars that are prefixed withBUILDKITE_
(and overrides some of them + filters out some others to adjust to the VM environment being different from the host)This is done on purpose, for security reasons, and because we don't want to leak/overwrite unwanted/unexpected env vars (
SHELL
,USER
, …) from the host into the VM. So we should definitively keep such a filtering (as opposed to export all existing env vars from the host blindly).The issue
That being said, sometimes it'd be useful to have some specific env vars being transferred to the VM so that they can be resolved when they are used in the
.buildkite/commands/*.sh
scripts we call from thecommand:
attribute of our pipelines.One example of that is env vars we use in our ReleaseV2 scenario to pass values like
RELEASE_VERSION
to the pipeline.command:
attribute of the.yml
pipeline directly there's no issue, because those env vars are then resolved whenbuildkite-agent pipeline upload
parses the pipeline and interpolate those values at that time, and by the time the command to run is passed over to the VM viahostmgr generate buildkite-job
, that value has already been resolved, so the VM will receive the value of the env var, not the reference to it.buildkite/command/*.sh
that is called by thecommand:
attribute of the.yml
(in other words, we have an additional level of indirection), then that env var will only be evaluated when the VM will run theBUILDKITE_COMMAND
that tells it to call that.buildkite/command/*.sh
script, but the env var will not be available in the VM itself.Proposed Solution
It could be worth checking if there's a way to access the list of env vars listed in the
env:
attribute of thestep
being run by the YAML pipeline (+ the ones declared inenv:
at the root of the pipeline and that apply to all steps). If so, we could makehostmgr
also export those env vars in thehostmgr generate buildkite-job
script passed by SSH to the VM, allowing us to reference those from within our.buildkite/commands/*.sh
scripts.The nice thing with that approach is that we'd still keep the security aspect of not transferring all env vars blindly but only the ones explicitly declared for that
step
, using theenv
attribute as an allowlist of env vars to transfer to the VM for that job.Potential technical solution
I found multiple ways to get the list of env vars known to a job:
$BUILDKITE_ENV_FILE
buildkite-agent env dump
~ — nevermind, this dumps the env of thebuildkite-agent
process itself (includingHOME
,USER
, etc…) not the env of the job.[^1]: note that the
job-api-experiment
has been promoted to official feature in agent version3.64
—and we currently use3.65
. So this should already be available and working on our macOS hosts.All those contain more env vars than just the ones declared on the
env:
attribute of the step. In particular it seems to also contain:env:
vars passed when calling the API to trigger a new build (in particular: thePIPELINE=
we pass when we want to trigger a different pipeline than the default one via API call)env:
vars provided at the pipeline root level (like we often do forIMAGE_ID
, instead of repeating that one on eachstep
)env:
vars provided at the step levelBUILDKITE_*
env vars declared by Buildkite itselfBut it should be easy to make
GenerateBuildkiteJobScript
filter out theBUILDKITE_*
ones from that list and only calladdEnvironmentVariable(name:,value:)
for the remaining ones[^2].[^2]: One might think that we could also just keep the
BUILDKITE_*
ones from that list, and remove the call tocopyEnvironementVariables(prefixedBy:)
in our script instead. But that would not be equivalent, becausecopyEnvironmentVariables
is based on the list of env vars fromProcessInfo.processInfo.environment
, which includes additionalBUILDKITE_*
env vars that are exposed to the agent itself (e.g.BUILDKITE_AGENT_ACCESS_TOKEN
, etc) not just the ones exposed to the job