Closed wrossmann closed 1 year ago
Hard to say what's going wrong. Does it fail instantly ?
No, TF goes through its usual flow and errors out while applying.
Maybe you can try to set validate.skip to true.
tf can access the api server ?
Setting validate.skip
to true has no apparent effect.
API server? I was not aware that there was a server involved.
If you'd like to question me a bit more directly/actively I had posted this on Slack, but I thought it would be rude to flag you directly. https://kubernetes.slack.com/archives/C3QUFP0QM/p1671487804695779
Api server is at the heart of a kube cluster.
The provider definitely needs to access it. Depending on your network topology it will involve a vpn, direct connect or public access.
This is not a limitation of this provider, it applies to kOps CLI too. Please check your network setup.
Ok but there's no cluster. Am I mistaken in thinking that this provider's resources stand up and bootstrap a k8s cluster from scratch? Should these resources not first spin up the instances/instance groups via AWS APIs before trying to connect?
I'm double-checking with terraform show
and both the 'master' and 'node' groups are configured to spin up 3 and 4 instances respectively, but I see no instances created. The config is created in the S3 bucket, but seeing as the updater never starts up it's never applied.
This provider won’t create the network (vpc, subnets, gateways, etc…).
It will create auto scaling groups and eventually a load balancer in front of your masters. You can check if some of those resources have been created in aws.
Now depending on your topology the lb could have a private ip, if it is the case you will need some kind of vpn to communicate with it.
If no cloud resources have been created it could be because something is wrong with the subnets or you didn’t provide an iam role with enough permissions.
I guess this should show up in the logs.
If the cluster spec is created in s3 you can try to apply with kOps CLI to see if it works.
Once I finagled the credentials for the CLI [which doesn't seem to support role assumption at all?] the result of
kops --name k8s.test.company.aws --state s3://company-kops-state/ update cluster
Was:
F1221 13:16:33.819378 1295836 task.go:73] found duplicate tasks with name "ManagedFile/manifests-etcdmanager-main-master":
*fitasks.ManagedFile {"Name":"manifests-etcdmanager-main-master","Lifecycle":"Sync","Base":null,"Location":"manifests/etcd/main-master.yaml","Contents":"...","Public":null}
and
*fitasks.ManagedFile {"Name":"manifests-etcdmanager-main-master","Lifecycle":"Sync","Base":null,"Location":"manifests/etcd/main-master.yaml","Contents":"...","Public":null}
Which I've reformatted for clarity. The contents I've excerpted are:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
k8s-app: etcd-manager-main
name: etcd-manager-main
namespace: kube-system
spec:
containers:
- command:
- /bin/sh
- -c
- mkfifo /tmp/pipe; (tee -a /var/log/etcd.log \u003c /tmp/pipe \u0026 ) ; exec /etcd-manager
--backup-store=s3://company-kops-state/k8s.test.company.aws/backups/etcd/main
--client-urls=https://__name__:4001 --cluster-name=etcd --containerized=true
--dns-suffix=.internal.k8s.test.company.aws --grpc-port=3996 --peer-urls=https://__name__:2380
--quarantine-client-urls=https://__name__:3994 --v=6 --volume-name-tag=k8s.io/etcd/main
--volume-provider=aws --volume-tag=k8s.io/etcd/main --volume-tag=k8s.io/role/master=1
--volume-tag=kubernetes.io/cluster/k8s.test.company.aws=owned \u003e /tmp/pipe 2\u003e\u00261
image: registry.k8s.io/etcdadm/etcd-manager:v3.0.20220831@sha256:a91fdaf9b988943a9c73d422348c2383c08dfc2566d4124a39a1b3d785018720
name: etcd-manager
resources:
requests:
cpu: 200m
memory: 100Mi
securityContext:
privileged: true
volumeMounts:
- mountPath: /rootfs
name: rootfs
- mountPath: /run
name: run
- mountPath: /etc/kubernetes/pki/etcd-manager
name: pki
- mountPath: /var/log/etcd.log
name: varlogetcd
hostNetwork: true
hostPID: true
priorityClassName: system-cluster-critical
tolerations:
- key: CriticalAddonsOnly
operator: Exists
volumes:
- hostPath:
path: /
type: Directory
name: rootfs
- hostPath:
path: /run
type: DirectoryOrCreate
name: run
- hostPath:
path: /etc/kubernetes/pki/etcd-manager-main
type: DirectoryOrCreate
name: pki
- hostPath:
path: /var/log/etcd.log
type: FileOrCreate
name: varlogetcd
status: {}
This seems to have been the result of my misunderstanding of kops "instance groups" and the seemingly redundant definition of 3 different 1-member groups [aka valid config] rather than a single 3-member groups. [the config I posted, which is not valid]
Ultimately I think that the only real issue here is that the provider did not [or could not?] emit the fatal error that the kops CLI does.
Thank you so much for your help and time.
Yes, unfortunately logs are not well supported in tf providers. Glad that you sorted it out in the end.
I am trying to integrate Kops with some existing infrastructure, but the provider keeps giving me the following error when I try to apply:
And what seems to be relevant output with
TF_LOG=debug
:Below is my kops config: