Open maxbrunet opened 7 months ago
I reckon @MaxymVlasov spent more than decent amount of time trying to implement locking for the TF init with plugin cache enabled and, since it was not quite implementable straightforwardly, the outcome was this code snippet: https://github.com/antonbabenko/pre-commit-terraform/pull/620/files#diff-a9db869be8201236cd6abaed19087fc571d29c10d679fc66bf945eba1a4ccd46R461-R483 (long story short: TF locks destination file and, if parallel TF init encounters error writing plugin, it will simply re-try)
If you're hitting any issues with this implementation, please add details so that we can investigate.
Not sure if handling the case where
plugin_cache_dir
is set in the config file would be possible.
That's a good shout by the way. We missed to check TF config for this parameter and relied upon env var only as far as I can see form the code snippet I mentioned above.
Something simple like grep ^plugin_cache_dir path/to/config/file
might work w/o a fuss, though the config file location would need some if/else
scaffold: https://developer.hashicorp.com/terraform/cli/config/config-file#locations β standard non-windows location of ~/.terraformrc
can be overridden with TF_CLI_CONFIG_FILE
env var.
@MaxymVlasov What do you think of adding logic for the plugin_cache_dir
rc-file parameter handling (we only need to detect whether it is in config file) or you did not relied upon it in #620 on purpose?
Hey @yermulnik, thanks for the quick reply
long story short: TF locks destination file and, if parallel TF init encounters error writing plugin, it will simply re-try
terraform init
does not fail writing to it, it writes corrupted providers and the lock files is updated with the hashes of these corrupted providers
Terraform validate.......................................................Failed
- hook id: terraform_validate
- exit code: 1
- files were modified by this hook
Command 'terraform init' successfully done: somedirectory
Validation failed: somedirectory
β·
β Error: registry.terraform.io/hashicorp/google: the cached package for registry.terraform.io/hashicorp/google 4.84.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file
β
β
β΅
Error: Terraform exited with code 1.
Command 'terraform init' successfully done: someotherdirectory
Lock terraform provider versions.........................................Passed
pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.
To run `pre-commit` as part of git workflow, use `pre-commit install`.
All changes made by hooks:
diff --git a/somedirectory/.terraform.lock.hcl b/somedirectory/.terraform.lock.hcl
index bcd3d4ff..09cb469e 100644
--- a/somedirectory/.terraform.lock.hcl
+++ b/somedirectory/.terraform.lock.hcl
@@ -10,6 +10,7 @@ provider "registry.terraform.io/hashicorp/google" {
"h1:3qXFXgT2RAseuYW20OUqBLMeSuKt5tbpNI15GU5J7KE=",
"h1:FtmujB+frf40ztU300FEd/cnah5o/ZVItCTYk+Qj1Wo=",
"h1:PhC05RRIJi2iPLf9ny2JsYxVlhdC4LvXsuPfg4U/Pbg=",
+ "h1:RhnrI2jZIB6BmqGeP+lNXY5UqRKZp8QcuTz5CG3zPOo=",
"h1:W9PU6CJQSb0p6pL+OonuWrCW2HJITiIykQkrwCAgYnw=",
"h1:c4ksHVh0kRBFDvP3peYx99oZgh/afzOGxPJwdApcKBU=",
"h1:fybaK74buTd4Ys2CUZm6jw7NXtSqtcLoW2jeNB4Ff2E=",
Btw if the automation is too complicated to implement, that's totally ok with me to close as wontfix. I mainly wanted to call out the problem, I think the minimum would be to document in the README that if the plugin cache is enabled then parallelism should be disabled for terraform_validate
, it could take a while for users to figure out the issue
That's odd =( I might mixed up the bits about TF locking that I tried to recall from when Max was working on #620. When Max has a chance, he will add insights from his experience in #620 I guess. Thanks for calling this out though. That's valuable.
I think the minimum would be to document in the README that if the plugin cache is enabled then parallelism should be disabled for
terraform_validate
, it could take a while for users to figure out the issue
Meanwhile would you be keen to contribute by outlining this in README? π€π»
Okay, let's assume that terraform_validate
failed during t init
, wrote corrupted SHAs and then terraform_providers_lock
passed
terraform_validate
)terraform_validate
will as long as I understand that you need working provider binaies to validate tf code
terraform_validate
will be happy => till terraform_providers_lock
generates right lock file.At the same time if you check README for terraform_validate
p.3, you'll find that recommended solution is
- id: terraform_validate
args:
- --hook-config=--retry-once-with-cleanup=true # Boolean. true or false
Also, README for terraform_providers_lock
in p.1 describe supported modes, and each example includes - --hook-config=--retry-once-with-cleanup=true
If you think that such docs are confusing or not enough and you have ideas in which places there should be some extra mentions of that - please let us know. Ideally, via PR.
Thank you @MaxymVlasov for the detailed reply
At the same time if you check README for
terraform_validate
p.3, you'll find that recommended solution is
Since --retry-once-with-cleanup
only retries once, the races would still happen and be visible, so I feel like the safest is to recommend disabling parallelism when cache is enabled. Here is my suggestion #642
Since --retry-once-with-cleanup only retries once, the races would still happen and be visible,
I tested it a few times on a repo with more than 200 lockfiles with paralelizm=64, and there were 0 problems. But whatever
btw, init retries 10 times
Oh, okay, it possible to commit broken checksum in some corner cases
There should be a retry and it is specified in the configuration, but it never happened, and then to 1 broken h1
, added 4 valid h1
:
- id: terraform_validate
args:
- --hook-config=--parallelism-ci-cpu-cores=4 # DinD, can't calculate CPU cores
- --hook-config=--parallelism-limit=CPU*2
- --hook-config=--retry-once-with-cleanup=true
- --tf-init-args=-upgrade
I'm consistently seeing this error using the hooks version 1.96.1
but it works fine with 1.88.0
Terraform validate.......................................................Failed
- hook id: terraform_validate
- exit code: 1
Command 'terraform init' successfully done: terragrunt/launchdarkly/eu-west-1/feature-flags
Validation failed. Removing cached providers and modules from "terragrunt/launchdarkly/eu-west-1/feature-flags/.terraform" directory
Re-validating: terragrunt/launchdarkly/eu-west-1/feature-flags
Command 'terraform init' successfully done: terragrunt/launchdarkly/atmta032/eu-west-1/feature-flags
Validation failed: terragrunt/launchdarkly/eu-west-1/feature-flags
β·
β Error: missing or corrupted provider plugins:
β - registry.terraform.io/launchdarkly/launchdarkly: there is no package for registry.terraform.io/launchdarkly/launchdarkly 2.20.2 cached in .terraform/providers
β - registry.terraform.io/hashicorp/aws: there is no package for registry.terraform.io/hashicorp/aws 5.67.0 cached in .terraform/providers
β
β
pre-commits config
default_install_hook_types: [pre-commit, commit-msg]
repos:
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v3.1.0
hooks:
- id: conventional-pre-commit
stages: [commit-msg]
- repo: https://github.com/gruntwork-io/pre-commit
rev: v0.1.23
hooks:
- id: terragrunt-hclfmt
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.88.0
hooks:
# Does not work with our TG structure. TODO: Investigate
# - id: terragrunt_providers_lock
# - id: terragrunt_validate_inputs
- id: terraform_fmt
- id: terraform_validate
args:
- --hook-config=--retry-once-with-cleanup=true # Boolean. true or false
- id: terraform_tflint
args:
- '--args=--call-module-type=local'
- '--args=--only=terraform_deprecated_interpolation'
- '--args=--only=terraform_deprecated_index'
- '--args=--only=terraform_unused_declarations'
- '--args=--only=terraform_comment_syntax'
- '--args=--only=terraform_documented_outputs'
- '--args=--only=terraform_documented_variables'
- '--args=--only=terraform_typed_variables'
- '--args=--only=terraform_module_pinned_source'
- '--args=--only=terraform_naming_convention'
- '--args=--only=terraform_required_version'
- '--args=--only=terraform_required_providers'
- '--args=--only=terraform_standard_module_structure'
- '--args=--only=terraform_workspace_remote'
- id: terraform_docs
args:
- '--args=--config=.terraform-docs.yml'
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-merge-conflict
- id: check-json
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: no-commit-to-branch
args: ['--branch', 'main']
What problem are you facing?
620 introduces parallelism, this can lead to data races when the plugin cache is enabled, with
terraform init
writing in parallel to the cache, and creating corrupted providers (especially visible with the lock file where random provider hashes get appended).This can be worked around by setting the
--hook-config=--parallelism-limit=1
arg onterraform_validate
hook.How could pre-commit-terraform help solve your problem?
It would be nice if the hooks using
terraform init
(mostlyterraform_validate
, butterragrunt_providers_lock
has it deprecated too) could disable parallelism automatically when theTF_PLUGIN_CACHE_DIR
environment variable is set.Not sure if handling the case where
plugin_cache_dir
is set in the config file would be possible.