Closed Analect closed 2 years ago
Hi @Analect,
I think I may have an idea of why you ran into this situation, but I need to confirm first. Let me try to reproduce this issue and get back to you.
Thanks.
@Analect Found the issue. A fix is pending via https://github.com/digitalocean/container-blueprints/issues/18.
Thanks @mtiutiu-heits .
I was probably wrong when I said the flux-system components were not created.
On checking kubectl get pods -n flux-system
I see:
NAME READY STATUS RESTARTS AGE
helm-controller-55896d6ccf-scnnd 1/1 Running 0 18h
kustomize-controller-76795877c9-nqwrn 1/1 Running 0 18h
notification-controller-7ccfbfbb98-slsxx 1/1 Running 0 18h
source-controller-6b8d9cb5cc-cjcs5 1/1 Running 0 18h
However the other flux get all
errors above persist.
Also, I tried running flux bootstrap github --owner=<my-github-org> --repository=<my-repo>
thinking that might rectify any misconfiguration. The response suggests all is OK, but those errors on flux get all
persist.
Please enter your GitHub personal access token (PAT):
► connecting to github.com
► cloning branch "main" from Git repository "https://github.com/xxx/xxx.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed sync manifests to "main" ("xxxed2b84")
► pushing component manifests to "https://github.com/xxx/xxx.git"
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
✔ source secret up to date
✗ sync path configuration ("") would overwrite path ("./clusters/dev") of existing Kustomization
@Analect Found the issue. A fix is pending via digitalocean/container-blueprints#18.
Thanks @mtiutiu-heits ... are there any manual steps I can take to fix and reload/apply to the cluster?
@Analect I don't know if it's ok to use both methods, meaning bootstrapping Flux CD via Terraform and then via the flux CLI.
Did you uninstalled Flux CD first via: flux uninstall
before bootstrapping again ? Because there are still some things that are left behind, like CRDs.
For fixing it manually, please follow below steps:
flux-system
secret:kubectl edit secret flux-system -n flux-system
known_hosts
field and change the value to:Z2l0aHViLmNvbSBlY2RzYS1zaGEyLW5pc3RwMjU2IEFBQUFFMlZqWkhOaExYTm9ZVEl0Ym1semRIQXlOVFlBQUFBSWJtbHpkSEF5TlRZQUFBQkJCRW1LU0VOalFFZXpPbXhrWk15N29wS2d3RkI5bmt0NVlScllNak51RzVOODd1UmdnNkNMcmJvNXdBZFQveTZ2MG1LVjBVMncwV1oyWUIvKytUcG9ja2c9Cg==
flux-system
git repository resource:flux reconcile source git flux-system
The above value for known_hosts
is the base64
encoded form of:
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
Let me know if it helps.
P.S.: Sorry for replying with both GitHub accounts (I'm a contractor, and one account is associated with the company that I work for). I forgot to switch accounts, chrome profiles, etc. Too many things to do sometimes 😄 .
Thanks @v-ctiutiu
That seems to have partially worked. Having followed steps above, I then run flux get all
. Any idea what the error on the kustomization/flux-system might be related to?
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84 main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84 False
NAME READY MESSAGE REVISION SUSPENDED
kustomization/flux-system False CustomResourceDefinition/kustomizations.kustomize.toolkit.fluxcd.io dry-run failed, reason: Invalid, error: CustomResourceDefinition.apiextensions.k8s.io "kustomizations.kustomize.toolkit.fluxcd.io" is invalid: status.storedVersions[1]: Invalid value: "v1beta2": must appear in spec.versions False
@Analect You can also override the Terraform module value for that variable in your main.tf file, like this (notice the last line):
module "doks_flux_cd" {
source = "github.com/digitalocean/container-blueprints/create-doks-with-terraform-flux"
# DOKS
do_api_token = "<YOUR_DO_API_TOKEN_HERE>" # DO API TOKEN (string value)
doks_cluster_name = "<YOUR_DOKS_CLUSTER_NAME_HERE>" # Name of this `DOKS` cluster ? (string value)
doks_cluster_region = "<YOUR_DOKS_CLUSTER_REGION_HERE>" # What region should this `DOKS` cluster be provisioned in ? (string value)
doks_cluster_version = "1.21.3-do.0" # What Kubernetes version should this `DOKS` cluster use ? (string value)
doks_cluster_pool_size = "<YOUR_DOKS_CLUSTER_POOL_SIZE_HERE>" # What machine type to use for this `DOKS` cluster ? (string value)
doks_cluster_pool_node_count = <YOUR_DOKS_CLUSTER_POOL_NODE_COUNT_HERE> # How many worker nodes this `DOKS` cluster should have ? (integer value)
# GitHub
# Important notes:
# - This module expects your Git `repository` and `branch` to be created beforehand
# - Currently, the `github_token` doesn't work with SSO
github_user = "<YOUR_GITHUB_USER_HERE>" # Your `GitHub` username
github_token = "<YOUR_GITHUB_TOKEN_HERE>" # Your `GitHub` personal access token
git_repository_name = "<YOUR_GIT_REPOSITORY_NAME_HERE>" # Git repository where `Flux CD` manifests should be stored
git_repository_branch = "<YOUR_GIT_REPOSITORY_BRANCH_HERE>" # Branch name to use for this `Git` repository (e.g.: `main`)
git_repository_sync_path = "<YOUR_GIT_REPOSITORY_SYNC_PATH_HERE>" # Git repository path where the manifests to sync are committed (e.g.: `clusters/dev`)
github_ssh_pub_key = "ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg="
}
Thanks @v-ctiutiu That seems to have partially worked. Having followed steps above, I then run
flux get all
. Any idea what the error on the kustomization/flux-system might be related to?NAME READY MESSAGE REVISION SUSPENDED gitrepository/flux-system True Fetched revision: main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84 main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84 False NAME READY MESSAGE REVISION SUSPENDED kustomization/flux-system False CustomResourceDefinition/kustomizations.kustomize.toolkit.fluxcd.io dry-run failed, reason: Invalid, error: CustomResourceDefinition.apiextensions.k8s.io "kustomizations.kustomize.toolkit.fluxcd.io" is invalid: status.storedVersions[1]: Invalid value: "v1beta2": must appear in spec.versions False
I suspect that you have now a Flux CD environment with mixed stuff, meaning old CRDs from the old one as well. I see that it complains about CRDs version. What works best if it's not a big issue for you, is to uninstall Flux CD completely via flux uninstall
and then bootstrap it again.
Let me do this first in my current setup, and see if overriding the github_ssh_pub_key
parameter fixes all the issues first. Then, I will try to find a manual fix for your environment as well if possible.
Thanks.
@Analect Can you try this and let me know if it works:
flux reconcile kustomization flux-system -n flux-system --with-source
Tried flux reconcile kustomization flux-system -n flux-system --with-source
... and got this:
► annotating GitRepository flux-system in flux-system namespace
✔ GitRepository annotated
◎ waiting for GitRepository reconciliation
✔ fetched revision main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84
► annotating Kustomization flux-system in flux-system namespace
✔ Kustomization annotated
◎ waiting for Kustomization reconciliation
✗ Kustomization reconciliation failed: CustomResourceDefinition/kustomizations.kustomize.toolkit.fluxcd.io dry-run failed, reason: Invalid, error: CustomResourceDefinition.apiextensions.k8s.io "kustomizations.kustomize.toolkit.fluxcd.io" is invalid: status.storedVersions[1]: Invalid value: "v1beta2": must appear in spec.versions
So you suggest running flux uninstall
.
By bootstrapping again, do you mean rerunning these:
terraform plan -out starter_kit_flux_cluster.out
terraform apply "starter_kit_flux_cluster.out"
Does that require me to tear-down the existing cluser?
OK. Ran:
flux uninstall
terraform plan -out starter_kit_flux_cluster.out
terraform apply "starter_kit_flux_cluster.out"
It recreated the flux-system pods, but on running flux get all
, I don't see any reference kustomization/flux-system
.
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84 main/97ba7b0970fcbeadbfe2fe44fb639d2254ed2b84 False
I ran flux reconcile kustomization flux-system -n flux-system --with-source
... but got:
✗ no matches for kind "Kustomization" in version "kustomize.toolkit.fluxcd.io/v1beta2"
@Analect
To start fresh, and without deleting the whole cluster you need to:
Flux CD
via:flux uninstall
flux-system
namespace gets deleted. Check with: kubectl get ns
. If it's still there and having finalizers that hold it from being deleted, then you need to use this script to delete it forcibly.github_ssh_pub_key
TF variable as explained previously.plan
and apply
, as you already mentioned (before applying, please inspect the plan carefully, and notice the changes - should be Flux CD
related mostly):terraform plan -out starter_kit_flux_cluster.out
terraform apply "starter_kit_flux_cluster.out"
Terraform should see the differences and re-create the missing parts only, meaning Flux CD components (if you still have the state file in your working directory, or on the S3 bucket).
Let me know how it goes and if it fixes your issue. Thanks.
@v-ctiutiu followed your instructions:
flux uninstall
Are you sure you want to delete Flux and its custom resource definitions: y
► deleting components in flux-system namespace
✔ Deployment/flux-system/helm-controller deleted
✔ Deployment/flux-system/kustomize-controller deleted
✔ Deployment/flux-system/notification-controller deleted
✔ Deployment/flux-system/source-controller deleted
✔ Service/flux-system/notification-controller deleted
✔ Service/flux-system/source-controller deleted
✔ Service/flux-system/webhook-receiver deleted
✔ NetworkPolicy/flux-system/allow-egress deleted
✔ NetworkPolicy/flux-system/allow-scraping deleted
✔ NetworkPolicy/flux-system/allow-webhooks deleted
✔ ServiceAccount/flux-system/helm-controller deleted
✔ ServiceAccount/flux-system/kustomize-controller deleted
✔ ServiceAccount/flux-system/notification-controller deleted
✔ ServiceAccount/flux-system/source-controller deleted
✔ ClusterRole/crd-controller-flux-system deleted
✔ ClusterRoleBinding/cluster-reconciler-flux-system deleted
✔ ClusterRoleBinding/crd-controller-flux-system deleted
► deleting toolkit.fluxcd.io finalizers in all namespaces
✔ GitRepository/flux-system/flux-system finalizers deleted
► deleting toolkit.fluxcd.io custom resource definitions
✔ CustomResourceDefinition/alerts.notification.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/buckets.source.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/gitrepositories.source.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/helmcharts.source.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/helmreleases.helm.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/helmrepositories.source.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/kustomizations.kustomize.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/providers.notification.toolkit.fluxcd.io deleted
✔ CustomResourceDefinition/receivers.notification.toolkit.fluxcd.io deleted
✔ Namespace/flux-system deleted
✔ uninstall finished
The namespace flux-system
was stuck in a Terminating
state for some time, so I went ahead and ran:
(
NAMESPACE=flux-system
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary @temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
)
This appeared to work. I notice this at the end of the output on running that script.
{
"type": "NamespaceContentRemaining",
"status": "True",
"lastTransitionTime": "2021-12-10T18:01:39Z",
"reason": "SomeResourcesRemain",
"message": "Some resources are remaining: kustomizations.kustomize.toolkit.fluxcd.io has 1 resource instances"
},
{
"type": "NamespaceFinalizersRemaining",
"status": "True",
"lastTransitionTime": "2021-12-10T18:01:39Z",
"reason": "SomeFinalizersRemain",
"message": "Some content in the namespace has finalizers remaining: finalizers.fluxcd.io in 1 resource instances"
}
]
}
I reran:
terraform plan -out starter_kit_flux_cluster.out
terraform apply "starter_kit_flux_cluster.out"
... but running flux get all
, I'm still not seeing this kustomization/flux-system
. I tried flux reconcile kustomization flux-system -n flux-system --with-source
again, but just get back:
✗ no matches for kind "Kustomization" in version "kustomize.toolkit.fluxcd.io/v1beta2"
@Analect
Ok I reproduced your issue, and it seems that the main TF module from the container-blueprints
repo is a little bit outdated in regards to Flux CD
provider. So, I went and updated the Flux CD Terraform provider in my GitHub fork of the container-blueprints
repo, to use the latest version. The container-blueprints
repo holds the main Terraform module code btw, which is then used in the Starter Kit
.
I assume that you have locally the latest version for the flux CLI, right ? (or at least a very recent version)
If so, the Flux CD
provider from the TF module created in the container-blueprints
repo needs an update as well, because it's old. I'm suspecting that the Kustomization Controller
issue is caused by this. The TF provider for Flux and the CLI counterpart need to be not too distant when talking about the version.
So, I uninstalled Flux CD
again via flux uninstall
, and then I used my updated version for the module source like this (in the main.tf
file):
module "doks_flux_cd" {
source = "github.com/v-ctiutiu/container-blueprints/create-doks-with-terraform-flux"
...
}
After planning again and then applying, I got both resources:
flux get all
And the output is:
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/95ae1bd47e4ce8cefd5e0bd409e3fe520ff748e1 main/95ae1bd47e4ce8cefd5e0bd409e3fe520ff748e1 False
NAME READY MESSAGE REVISION SUSPENDED
kustomization/flux-system True Applied revision: main/95ae1bd47e4ce8cefd5e0bd409e3fe520ff748e1 main/95ae1bd47e4ce8cefd5e0bd409e3fe520ff748e1 False
Please test and let me know if it works for you as well. If it does, then I will create another PR for the container-blueprints
repo to address this issue as well.
Thanks.
@v-ctiutiu . Thanks for your efforts with this. Having gone through all your steps above, unfortunately I still can't get this kustomization/flux-system to 'show up'.
$ kubectl get pods -n flux-system
NAME READY STATUS RESTARTS AGE
helm-controller-779b58df6b-f4lmj 1/1 Running 0 98s
kustomize-controller-5db6bfc56d-cqwzh 1/1 Running 0 98s
notification-controller-7ccfbfbb98-lrqb4 1/1 Running 0 98s
source-controller-565f8fbbff-g6ptc 1/1 Running 0 98s
$ flux get all
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/3fe239b88ce7725d7867215884940adf77dde94a main/3fe239b88ce7725d7867215884940adf77dde94a False
$ flux reconcile kustomization flux-system -n flux-system --with-source
✗ no matches for kind "Kustomization" in version "kustomize.toolkit.fluxcd.io/v1beta2"
Back in my github repo, flux-system/gotk-sync.yaml
updated as follows. This would suggest that the v1beta2 for kustomize that keeps getting complained about was updated, but maybe that hasn't been properly enforced on the cluster.
The upgrade of the fluxcd provider to 0.8.1 required me to run terraform init -upgrade
, which it appears has upgraded various other components too in the flux-system\gotk-components.yaml
file.
Sorry, I realise I'm fumbling around a bit blindly here, but would be good to get this running as per starter-kit demo. Tks.
I think this might be relevant to what is going wrong. "When using the Terraform provider for Flux, you have to manually remove the v1beta1 Kustomization from the TF state" with:
terraform state rm 'kubectl_manifest.sync["kustomize.toolkit.fluxcd.io/v1beta1/kustomization/flux-system/flux-system"]'
I got:
Error: Invalid target address
│
│ No matching objects found. To view the available instances, use "terraform state list". Please modify the address to reference a specific instance.
When I run terraform state list
, I get:
module.doks_flux_cd.data.digitalocean_kubernetes_cluster.primary
module.doks_flux_cd.data.flux_install.main
module.doks_flux_cd.data.flux_sync.main
module.doks_flux_cd.data.github_repository.main
module.doks_flux_cd.data.kubectl_file_documents.install
module.doks_flux_cd.data.kubectl_file_documents.sync
module.doks_flux_cd.digitalocean_kubernetes_cluster.primary
module.doks_flux_cd.github_repository_deploy_key.main
module.doks_flux_cd.github_repository_file.install
module.doks_flux_cd.github_repository_file.kustomize
module.doks_flux_cd.github_repository_file.sync
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/alerts.notification.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/buckets.source.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/gitrepositories.source.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/helmcharts.source.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/helmreleases.helm.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/helmrepositories.source.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/kustomizations.kustomize.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/providers.notification.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apiextensions.k8s.io/v1/customresourcedefinition/receivers.notification.toolkit.fluxcd.io"]
module.doks_flux_cd.kubectl_manifest.install["apps/v1/deployment/flux-system/helm-controller"]
module.doks_flux_cd.kubectl_manifest.install["apps/v1/deployment/flux-system/kustomize-controller"]
module.doks_flux_cd.kubectl_manifest.install["apps/v1/deployment/flux-system/notification-controller"]
module.doks_flux_cd.kubectl_manifest.install["apps/v1/deployment/flux-system/source-controller"]
module.doks_flux_cd.kubectl_manifest.install["networking.k8s.io/v1/networkpolicy/flux-system/allow-egress"]
module.doks_flux_cd.kubectl_manifest.install["networking.k8s.io/v1/networkpolicy/flux-system/allow-scraping"]
module.doks_flux_cd.kubectl_manifest.install["networking.k8s.io/v1/networkpolicy/flux-system/allow-webhooks"]
module.doks_flux_cd.kubectl_manifest.install["rbac.authorization.k8s.io/v1/clusterrole/crd-controller-flux-system"]
module.doks_flux_cd.kubectl_manifest.install["rbac.authorization.k8s.io/v1/clusterrolebinding/cluster-reconciler-flux-system"]
module.doks_flux_cd.kubectl_manifest.install["rbac.authorization.k8s.io/v1/clusterrolebinding/crd-controller-flux-system"]
module.doks_flux_cd.kubectl_manifest.install["v1/namespace/flux-system"]
module.doks_flux_cd.kubectl_manifest.install["v1/service/flux-system/notification-controller"]
module.doks_flux_cd.kubectl_manifest.install["v1/service/flux-system/source-controller"]
module.doks_flux_cd.kubectl_manifest.install["v1/service/flux-system/webhook-receiver"]
module.doks_flux_cd.kubectl_manifest.install["v1/serviceaccount/flux-system/helm-controller"]
module.doks_flux_cd.kubectl_manifest.install["v1/serviceaccount/flux-system/kustomize-controller"]
module.doks_flux_cd.kubectl_manifest.install["v1/serviceaccount/flux-system/notification-controller"]
module.doks_flux_cd.kubectl_manifest.install["v1/serviceaccount/flux-system/source-controller"]
module.doks_flux_cd.kubectl_manifest.sync["kustomize.toolkit.fluxcd.io/v1beta2/kustomization/flux-system/flux-system"]
module.doks_flux_cd.kubectl_manifest.sync["source.toolkit.fluxcd.io/v1beta1/gitrepository/flux-system/flux-system"]
module.doks_flux_cd.kubernetes_namespace.flux_system
module.doks_flux_cd.kubernetes_secret.main
module.doks_flux_cd.tls_private_key.main
I can see module.doks_flux_cd.kubectl_manifest.sync["kustomize.toolkit.fluxcd.io/v1beta2/kustomization/flux-system/flux-system"
in there. But this still doesn't explain why I'm getting this no matches for kind "Kustomization" in version "kustomize.toolkit.fluxcd.io/v1beta2"
@Analect
First of all - Great job!
These are my latest notes and findings, after doing some more debugging and re-reading your replies. Before moving on with other explanations, let me emphasize two important things:
Terraform
is using its private state machine
to keep track of changes, and stores state in a file
on your local machine (or remotely, via S3
).Kubernetes
has its private state machine
, and stores current system state in the etcd
database.So far so great, but not quite. Sometimes I hate state machines, especially when not only one is present and need to be synchronized. The problem is that, if you act externally with some other tool and alter one of the two state machines, then the other one is not aware of the changes. In your case, Terraform is not aware of the fact that you bootstrapped Flux CD again via the CLI (flux bootstrap github --owner=<gh_owner> --repository=<flux_repo>
). If you run flux bootstrap
, existing Flux API definitions may be updated or new ones will be added in your Kubernetes cluster.
Before moving further, what I did was to re-create the initial scenario that you ran into:
I ran flux get all
, and got everything except the kustomization
resource:
flux get all
# The actual result:
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/1b43... main/1b43... False
I reproduced your issue - great!
Now, what I did was to list the supported API versions for Flux CD:
kubectl api-versions | grep flux
And I got:
helm.toolkit.fluxcd.io/v2beta1
kustomize.toolkit.fluxcd.io/v1beta1
notification.toolkit.fluxcd.io/v1beta1
source.toolkit.fluxcd.io/v1beta1
Looking at the above, you can see that kustomize.toolkit.fluxcd.io
is present at version v1beta1
. If I list the kustomization
objects via kubectl
directly, it's there:
kubectl get kustomizations -A
# The actual result:
NAMESPACE NAME READY STATUS AGE
flux-system flux-system True Applied revision: main/1b43faf02da567e415aae57a7ecda865fd5b8063 4m46s
But then the question remains: why flux get all
doesn't see it? The next steps should give you some hints at least.
What I did next was to run flux bootstrap
on an existing Flux CD installation. So, what's different here? I have the latest flux CLI installed on my local machine (or at least a newer version than the one used when the Starter Kit automation chapter was written). On the Flux CD side, I have the old version deployed in the cluster, via the Starter Kit Terraform module (provider is at version 0.2.x).
Before I move on, let me quote the command and the output that you pasted in a previous reply:
Also, I tried running flux bootstrap github --owner=
--repository= thinking that might rectify any misconfiguration. The response suggests all is OK, but those errors on flux get all persist. Please enter your GitHub personal access token (PAT):
► connecting to github.com
► cloning branch "main" from Git repository "https://github.com/xxx/xxx.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed sync manifests to "main" ("xxxed2b84")
► pushing component manifests to "https://github.com/xxx/xxx.git"
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
✔ source secret up to date
✗ sync path configuration ("") would overwrite path ("./clusters/dev") of existing Kustomization
What happens after you run the above is, flux client (or the CLI counterpart of Flux CD) will create new API definitions for the Flux components in your cluster besides the existing ones. In your case, the kustomize.toolkit.fluxcd.io
. Why ? Because you have a new flux client version installed on your machine, and it wants you to have the latest version of the Flux CD components deployed in your cluster, meaning v1beta2
. And this makes sense after all. But, and this is very important - it will not update your Git repository manifests from the sync path. You can see that in the last line from the above output: ✗ sync path configuration ("") would overwrite path ("./clusters/dev") of existing Kustomization
.
On the other hand, Terraform is not aware of this change, and it thinks that Kustomization
is still at v1beta1
. I think it can be synchronized, by changing the logic in the main TF module, or in the Flux CD provider. Terraform can import state also if running a separate command. But, it's out of scope for the current discussion.
So, after running flux bootstrap
, I was hit by the same issue as yours:
flux get all
# The actual result:
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/685e... main/685ea... False
NAME READY MESSAGE REVISION SUSPENDED
kustomization/flux-system False apply failed: The CustomResourceDefinition "kustomizations.kustomize.toolkit.fluxcd.io" is invalid: status.storedVersions[1]: Invalid value: "v1beta2": must appear in spec.versions main/1b43faf02da567e415aae57a7ecda865fd5b8063 False
Nothing new so far, but if I run kubectl api-versions | grep flux
, something is revealed:
helm.toolkit.fluxcd.io/v2beta1
kustomize.toolkit.fluxcd.io/v1beta1
kustomize.toolkit.fluxcd.io/v1beta2
notification.toolkit.fluxcd.io/v1beta1
source.toolkit.fluxcd.io/v1beta1
Looking at the above output, you can see that now I have two versions for kustomize.toolkit.fluxcd.io
. So what I think is that the flux CLI expects to have a v1beta2
of the kustomize.toolkit.fluxcd.io
CRD type created, but it's not ! Why ? Because when you bootstrapped Flux again via CLI, the new API version is now defined in the cluster, but the new object or resource is not there. And Flux expects a new resource with a new version of v1beta2
to be available or instantiated. Going further, you would also need to update manually the yaml manifests in the sync path from your Git repo, to use the latest API version and specs. By already using the updated TF module from my Git repository, this part was handled automatically for you by TF.
Before moving further, I'm curious what's the output of running below command in your environment ?
kubectl get kustomizations -A
To stay consistent with the Starter Kit, you need to downgrade the flux client (or the CLI counterpart). As you already pointed in the last reply, like mentioning the Upgrade Flux to the v1beta2 API discussion from the Flux CD official repo, this is an upgrade scenario issue. Currently, Starter Kit
doesn't deal with upgrade scenarios because we wanted to keep things simple (it's a "starter" after all).
Getting back to the main issue, the only viable solution that I see now is to downgrade the Flux client. I still don't know why it doesn't create the new Kustomization
resource, after the manifests in your Git repository were updated to the newest version as well.
On our end, I should add a note about this in the prerequisites section for the affected chapter (meaning to use an older flux client version). On the other hand, we plan to upgrade all the Starter Kit components very soon, so an upgrade section for each chapter is necessary after all.
To fix your current installation this time (hopefully), please follow below steps:
Uninstall Flux CD via:
flux uninstall
Revert the module source in main.tf
file to point to the original:
module "doks_flux_cd" {
source = "github.com/digitalocean/container-blueprints/create-doks-with-terraform-flux"
...
}
Run TF init (when asked, run the upgrade as well). Then, plan and apply.
Uninstall the current version of flux CLI (or just make a backup of it, although you can download it anytime).
Install an old version for the flux CLI, which is compatible with the Starter Kit (e.g.: 0.17.0
, or any release dating from July
or August
):
curl -s https://fluxcd.io/install.sh | sudo FLUX_VERSION=0.17.0 bash
After I ran the above steps it started to work immediately. Let me know if it does the same for you.
Although I don't have a final answer your last question, I hope that I was able to give you some hints about why it behaves the way it is now.
Thanks a lot for your patience and time.
@v-ctiutiu . I greatly appreciate your efforts to explain what might be going on. I can see the power of the TF/flux combination, in terms of managing complexities in a Kubernetes cluster, but using these tools can also introduce a whole new set of complications!
Ahead of running your steps above, I ran these commands, as per your explanation. It seems kustomizations resources were absent from the cluster ... and maybe that was down to me over-riding things with my flux cli bootstrap
$ kubectl api-versions | grep flux
helm.toolkit.fluxcd.io/v2beta1
notification.toolkit.fluxcd.io/v1beta1
source.toolkit.fluxcd.io/v1beta1
$ kubectl get kustomizations -A
error: the server doesn't have a resource type "kustomizations"
On running those steps above and downgrading the Flux CLI to version 0.17, I now get this on calling flux get all
.
NAME READY MESSAGE REVISION SUSPENDED
gitrepository/flux-system True Fetched revision: main/3fe239b88ce7725d7867215884940adf77dde94a main/3fe239b88ce7725d7867215884940adf77dde94a False
NAME READY MESSAGE REVISION SUSPENDED
kustomization/flux-system True Applied revision: main/3fe239b88ce7725d7867215884940adf77dde94a main/3fe239b88ce7725d7867215884940adf77dde94a False
And now I see:
$ kubectl api-versions | grep flux
helm.toolkit.fluxcd.io/v2beta1
kustomize.toolkit.fluxcd.io/v1beta1
kustomize.toolkit.fluxcd.io/v1beta2
notification.toolkit.fluxcd.io/v1beta1
source.toolkit.fluxcd.io/v1beta1
$ kubectl get kustomizations -A
NAMESPACE NAME READY STATUS AGE
flux-system flux-system True Applied revision: main/3fe239b88ce7725d7867215884940adf77dde94a 4m5s
I suppose what intrigues me a bit is that with this latest uninstall/init/plan & apply does result in an update to the terraform.state
file in my digitalocean spaces. However, it doesn't make any new commits/alterations to my github repo that is capturing my flux state. It's not clear to me how different versions of flux manifest themselves in this github repo.
If one gets into a tangle like this in future, is it ever a solution to delete either the flux state in the github repo or the TF state in the digitalocean spaces, as a means of resetting?
@Analect To be honest, I don't have a real answer for it.
What has happened here in the end, is more or less a migration issue, I assume. What I don't have an answer for yet is (because we have not verified these scenarios - was a little bit out of scope for the Starter Kit
):
Flux CD
from an older version to a newer version (not a major release though) ?What I found on the Flux CD
documentation site about migration
, is this: https://fluxcd.io/docs/migration.
Maybe @stefanprodan, who is one of the main contributors of Flux CD can give us some hints of what has happened, or how to prevent this to happen in the future?
To summarize:
Flux CD
via a custom Terraform module, using an older provider version (0.2.x
). In the Starter Kit
(meaning this repo), we lock down versions (no major releases) for each component that we use (be it Helm releases, Terraform providers, etc.). This was an internal decision, to have consistent and predictable results.gitrepository
source Flux CD component, refused to work.container-blueprints
repo, so that Flux CD gitrepository
source component was functional again.Kustomization
component to our Kubernetes
cluster.Kustomization
component refused to appear in the list when invoking: flux get all
.Kustomization
component (@Analect, correct me if I'm wrong here).My final conclusion and to avoid all the above, is that the Starter Kit
should clearly specify to use a flux CLI version
that is compatible
with the TF provider
being used for deploying Flux CD
(meaning 0.2.x
). My bad here, for forgetting or not thinking of it at that time of writing.
As a side note, we constrain
all TF providers
version in a more strict (or pessimistic
) way, like this (patch version
upgrades only):
flux = {
source = "fluxcd/flux"
version = "~> 0.2.0"
}
@stefanprodan - If someone runs into this situation in the future, is it possible to "reset"
states in a clean way as @Analect already mentioned in the previous post, for both the Terraform state file (we already read this discussion) and Flux CD deployment?
Thanks a lot.
I have been following the automation tutorial. https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers/tree/main/15-automate-with-terraform-flux
I've re-run it a few times (recreating clusters), but it seems to get stuck on creating all the necessary flux-system components
When I run
flux get all
, then I get:And
flux logs
gives:It seems the various git credentials added in the
main.tf
file are right since files got added to thegit_repository_sync_path
that I supplied. However, these logs above suggest a related problem, where it can't access the GitRepository for other purposes.In the Github PAT, I granted these permissions in scope. Maybe that's not sufficient?
If I look in
.terraform/modules/create-doks-with-terraform-flux/provider.tf
I see:There is no base64 encoding/decoding suggested here.
Googling here suggests that maybe if a github user is a person rather than an org, then
--personal
flag should be passed. I'm not sure if that's relevant here and if that is handled in this starter kit. Also it suggest checking the content of the flux-system secret on the cluster, which should equate to an encoded Github PAT supplied in themain.tf
. It's not clear to me how that is best done.Any thoughts on how I might get over this stumbling block? Tks