Closed toast-gear closed 2 years ago
GitHub insists on running docker based actions as root, so this will affect any docker based action that writes to disk.
Can you confirm there is a permission problem with the .terraform
directory? This should not be written to the workspace.
I can imagine this is an issue for the plan outputs of dflook/terraform-plan
that were added in v1.16.0 though.
I have to admit, I don't understand the linked PR. As far as I can tell, the '--user' flag is for docker - but the args are passed to the entrypoint? Or does the entrypoint have to be updated to use the --user
flag?
I can imagine this is an issue for the plan outputs of dflook/terraform-plan that were added in v1.16.0 though.
yes, it will impact an action that writes to disk on a mounted volume, not all terraform commands write to .terraform/
.
Can you confirm there is a permission problem with the .terraform directory? This should not be written to the workspace.
Let me get screenshots tomorrow. We have peter-murray/reset-workspace-ownership-action
in basically every terraform pipeline and so I'll need to make a custom one to demonstrate the issue.
I have to admit, I don't understand the linked PR. As far as I can tell, the '--user' flag is for docker - but the args are passed to the entrypoint? Or does the entrypoint have to be updated to use the --user flag?
In the linked PR example, the change makes it so that when the docker://bridgecrew/checkov:2.0.469
image is ran, it is ran under the user id specified, if one is provided. The id provided is the user id of the user running the runner service, it doens't need to exist in the container. All it is doing is setting the docker --user flag so that the container is ran under a user ID matches the user running the runner service. As a result when the checkout action goes to do a git clean on teh subsequent run there is not permission conflict.
If we were using ephemeral runners this entire problem would be circumvented however my company isn't and it is going to take a while to migrate them away from them.
v1.17.1 was just released, which fixes the ownership of any files created in runner mounted directories like the workspace
Cheers pal, should hopefully get around to testing it on my end in the next few days, looks solid though so I'm sure the fix works a treat.
I think this is causing problems in our environment. dflook/terraform-validate
is bombing out.
Failing jobs are using : danielflook/terraform-github-actions@sha256:7340e0fda478b550b89feaa389a4397946e29a841f86ac39397a771ba205e06e
Success jobs are using danielflook/terraform-github-actions@sha256:07cd443fbd4fc64bddf6901cfb1e6daff9f4b3935e68324bf40d395fb2ad6a7f
The errors we're seeing are:
The pipeline step that is failing:
- name: Terraform Validate
uses: dflook/terraform-validate@v1
with:
path: ${{ matrix.submodule }}
label: ${{ matrix.submodule }}
env:
TERRAFORM_SSH_KEY: ${{ secrets.SSH_KEY }}
and our stategy is:
strategy:
fail-fast: false
matrix:
include:
# tf contained under the child folders
- submodule: folder-at-root-of-repo/folder
- submodule: folder-at-root-of-repo/folder
Error: Failed to install provider from shared cache
Error while importing hashicorp/kubernetes v2.5.0 from the shared cache
directory: provider binary not found: could not find executable file starting
with terraform-provider-kubernetes.
...
6 problems:
- Failed to instantiate provider "registry.terraform.io/hashicorp/aws" to
obtain schema: unknown provider "registry.terraform.io/hashicorp/aws"
- Failed to instantiate provider "registry.terraform.io/hashicorp/kubernetes"
to obtain schema: unknown provider
"registry.terraform.io/hashicorp/kubernetes"
- Failed to instantiate provider "registry.terraform.io/hashicorp/local" to
obtain schema: unknown provider "registry.terraform.io/hashicorp/local"
- Failed to instantiate provider "registry.terraform.io/hashicorp/null" to
obtain schema: unknown provider "registry.terraform.io/hashicorp/null"
- Failed to instantiate provider "registry.terraform.io/hashicorp/random" to
obtain schema: unknown provider "registry.terraform.io/hashicorp/random"
- Failed to instantiate provider "registry.terraform.io/hashicorp/template" to
obtain schema: unknown provider "registry.terraform.io/hashicorp/template"
Getting the teams to pin back to the previous release to see if that fixes it
I'd like to be able to reproduce this, what can you tell me about how your runners are setup?
Could you enable debug logging by setting the ACTIONS_STEP_DEBUG
secret to true
.
Do you use any other docker based actions that use terraform?
I'd like to be able to reproduce this, what can you tell me about how your runners are setup?
static runners unfortunately :(
Could you enable debug logging by setting the
ACTIONS_STEP_DEBUG
secret totrue
.
this produces quite a lot of output so I'll try picking out the bits that look useful:
terraform binary selection:
##[debug]ls -lad /root/.terraform.versions:lrwxrwxrwx 1 root root 63 Oct 7 15:57 /root/.terraform.versions -> /github/home/.dflook-terraform-github-actions/terraform-bin-dir
##[debug]ls -lad /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 3 51982 51982 4096 Oct 7 15:57 /github/home/.dflook-terraform-github-actions/terraform-bin-dir
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:total 80828
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 3 51982 51982 4096 Oct 7 15:57 .
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 3 51982 51982 4096 Oct 4 15:04 ..
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 2 51982 51982 4096 Oct 6 17:16 .terraform.versions.default
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:-rw-r--r-- 1 51982 51982 8 Oct 7 15:57 RECENT
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:-rwxr-xr-x 1 51982 51982 82749972 Oct 7 15:57 terraform_0.14.11
##[debug]tfswitch --version:
##[debug]tfswitch --version:Version: 0.8.832
Reading required version from terraform file, constraint: ~> 0.14.0
Switched terraform to version "0.14.11"
##[debug]ls -la /usr/local/bin/terraform:lrwxrwxrwx 1 root root 43 Oct 7 15:57 /usr/local/bin/terraform -> /root/.terraform.versions/terraform_0.14.11
##[debug] Terraform version major 0 minor 14 patch 11
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:total 80828
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 3 51982 51982 4096 Oct 7 15:57 .
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 3 51982 51982 4096 Oct 4 15:04 ..
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:drwxr-xr-x 2 51982 51982 4096 Oct 6 17:16 .terraform.versions.default
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:-rw-r--r-- 1 51982 51982 8 Oct 7 15:57 RECENT
##[debug]ls -la /github/home/.dflook-terraform-github-actions/terraform-bin-dir:-rwxr-xr-x 1 51982 51982 82749972 Oct 7 15:57 terraform_0.14.11
::endgroup::
/github/home permissions
##[debug]ls -la /github/home:total 1400
##[debug]ls -la /github/home:drwxr-xr-x 346 51982 51982 20480 Oct 7 11:42 .
##[debug]ls -la /github/home:drwxr-xr-x 6 root root 4096 Oct 7 15:02 ..
##[debug]ls -la /github/home:drwxr-xr-x 3 root root 4096 Sep 30 20:16 .cache
##[debug]ls -la /github/home:drwxr-xr-x 2 root root 4096 Sep 30 14:33 .dflook-terraform-bin-dir
##[debug]ls -la /github/home:drwxr-xr-x 4 root root 4096 Oct 4 08:59 .dflook-terraform-data-dir
##[debug]ls -la /github/home:drwxr-xr-x 3 51982 51982 4096 Oct 4 15:04 .dflook-terraform-github-actions
##[debug]ls -la /github/home:drwxr-xr-x 3 51982 51982 4096 Sep 30 14:33 .terraform.d
##[debug]ls -la /github/home:drwxr-xr-x 2 root root 4096 Oct 6 19:48 .yor_plugins
##[debug]ls -la /github/home:drwxr-xr-x 2 root root 4096 Sep 30 14:33 1291442970-krygefxy
...
the above is in contrast to the workspace where the user running the runner service owns everything:
/github/workspace permissions
##[debug]pwd:/github/workspace
##[debug]ls -la:drwxr-xr-x 17 51982 51982 4096 Oct 7 15:02 .
##[debug]ls -la:drwxr-xr-x 6 root root 4096 Oct 7 15:02 ..
##[debug]ls -la:drwxr-xr-x 8 51982 51982 4096 Oct 7 15:02 .git
##[debug]ls -la:drwxr-xr-x 3 51982 51982 4096 Sep 30 20:10 .github
##[debug]ls -la:-rw-r--r-- 1 51982 51982 30 Sep 30 20:10 .gitignore
##[debug]ls -la:-rw-r--r-- 1 51982 51982 342 Sep 30 20:10 README.md
...
terraform init output below
Downloading cloudposse/label/null 0.24.1 for web-node-group.label...
- web-node-group.label in /tmp/terraform-data-dir/modules/web-node-group.label
Downloading cloudposse/label/null 0.24.1 for web-node-group.this...
- web-node-group.this in /tmp/terraform-data-dir/modules/web-node-group.this
Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Finding hashicorp/aws versions matching ">= 2.0.0, >= 3.0.0"...
- Finding hashicorp/kubernetes versions matching ">= 1.0.0"...
- Finding hashicorp/tls versions matching ">= 2.2.0"...
- Finding hashicorp/template versions matching ">= 2.0.0"...
- Finding hashicorp/null versions matching ">= 2.0.0"...
- Finding hashicorp/local versions matching ">= 1.3.0"...
- Finding hashicorp/random versions matching ">= 2.0.0"...
- Using hashicorp/kubernetes v2.5.0 from the shared cache directory
- Using hashicorp/tls v3.1.0 from the shared cache directory
- Using hashicorp/template v2.2.0 from the shared cache directory
- Using hashicorp/null v3.1.0 from the shared cache directory
- Using hashicorp/local v2.1.0 from the shared cache directory
Error: Failed to install provider from shared cache
Error while importing hashicorp/kubernetes v2.5.0 from the shared cache
directory: provider binary not found: could not find executable file starting
with terraform-provider-kubernetes.
...
schema problems?
7 problems:
- Failed to instantiate provider "registry.terraform.io/hashicorp/aws" to
obtain schema: unknown provider "registry.terraform.io/hashicorp/aws"
- Failed to instantiate provider "registry.terraform.io/hashicorp/kubernetes"
to obtain schema: unknown provider
...
could it be that the ownership permissions need to do as a post-entrypoint
script process?
sanitised pipeline:
on: [pull_request]
jobs:
Terraform-plan:
runs-on: self-hosted
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
strategy:
fail-fast: false
matrix:
include:
# There is terraform under the child folders, backend.tf, providers.tf, locals.tf, main.tf (which sources a terraform module from git)
- submodule: parentFolder/child-1
- submodule: parentFolder/child-2
steps:
# I am aiming to have this action removed through this issue, isn't removed yet as a double chown shouldn't matter
- name: Get Actions user id
id: get_uid
run: |
actions_user_id=`id -u $USER`
echo $actions_user_id
echo ::set-output name=uid::$actions_user_id
- name: Correct Ownership in GITHUB_WORKSPACE directory
uses: peter-murray/reset-workspace-ownership-action@v1
with:
user_id: ${{ steps.get_uid.outputs.uid }}
- name: Checkout from GitHub
uses: actions/checkout@v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::***************:role/my-amazing-role
aws-region: ***************
role-duration-seconds: 900
- name: Terraform Linting (base)
uses: dflook/terraform-fmt@v1
with:
path: ${{ matrix.submodule }}
label: ${{ matrix.submodule }}
env:
TERRAFORM_SSH_KEY: ${{ secrets.SSH_KEY }}
# atm it blows up here before we get to the plan
- name: Terraform Validate
uses: dflook/terraform-validate@v1
with:
path: ${{ matrix.submodule }}
label: ${{ matrix.submodule }}
env:
TERRAFORM_SSH_KEY: ${{ secrets.SSH_KEY }}
- name: Terraform Plan
uses: dflook/terraform-plan@v1
with:
path: ${{ matrix.submodule }}
label: ${{ matrix.submodule }}
env:
TERRAFORM_SSH_KEY: ${{ secrets.SSH_KEY }}
I don't think I can provide much more detail because this is all in the container there isn't a way for me to get any output about the state of the files on disk outside of the mounted volumes. Where should the terraform init
artefacts end up on disk?
.teraform
is a temporary directory in the container, so every step starts with it empty.
Do you have multiple runner processes running on the same host, do they share the runner.temp directory?
How many times did it fail this way?
I refreshed our runner instances and it solved that weird init issue. I then however put our actions back to using the v1
tag and we got the below errors from the checkout action trying to clean the repo:
Cleaning the repository:
/usr/bin/git clean -ffdx
warning: failed to remove .dflook-terraform-github-actions/hlgwsdhe/plan.txt: Permission denied
warning: failed to remove .dflook-terraform-github-actions/hlgwsdhe/plan.json: Permission denied
warning: failed to remove .dflook-terraform/token-cache/as78d568sf568ds5f6d7s5f67ds5fd7s5fd67s5f67ds5f7ds5f7ds65f7d6s5f67ds: Permission denied
Removing parentFolder/child-1/.terraform.lock.hcl
Warning: Unable to clean or reset the repository. The repository will be recreated instead.
Deleting the contents of '/actions-runner/_work/repo/repo'
Error: Command failed: rm -rf "/actions-runner/_work/repo/repo/.dflook-terraform"
I think it's just more files that need the ownership fix applied to them
Most of them from the looks of it are generated by this action in all cases. In the case of .terraform.lock.hcl
we should be checking that into source but we don't on all repos and so the action should probably assume it may be generating it and so needs the ownership fix if it does.
It looks like those files were left behind by an old version of the these actions before the ownership fix. Can you manually delete the workspace and try again?
sure, let me refresh now, will respond within 10 mins
kicked it off, I thought however looking at the diff https://github.com/dflook/terraform-github-actions/compare/v1.17.0...v1.17.1 some of those files would still have the issue e.g. the plan.*
and .dflook-terraform/
?
Everything in 1.17.1 is now inside .dflook-terraform-github-actions
, which gets the ownership changed recursively
Yeh you're right. After a refresh, I monitored the workflow run on disk and it worked as expected. Running the workflow twice didn't result in any clean errors. Issue is resolved from my perspective and can be closed.
Thanks for looking into this as quickly as you did, much appreciated pal.
Great, glad it's working now!
If docker changes any files / folders on disk it changes the owner of the file / folder to be
root root
. This causes problems for runners that are not ephemeral as a subsequent run of a workflow will fail due to the checkout action being unable to clean the folder due to permission errors on the.terraform
folder.This can be worked around via
peter-murray/reset-workspace-ownership-action
:This is a faff however and it would be nicer if this issue could be resolved natively without the need for yet another action. I've raised a PR in another terraform action which has the same problem https://github.com/bridgecrewio/checkov-action/pull/59. Would this a solution be workable here too?