equinix / terraform-equinix-metal-anthos-on-baremetal

Terraform module for quick deployment of baremetal Anthos on Equinix Metal
https://registry.terraform.io/modules/equinix/anthos-on-baremetal
Apache License 2.0
26 stars 24 forks source link

gcloud application-default login required on GCE vm or terraform cluster create hangs #41

Open joshpadilla opened 3 years ago

joshpadilla commented 3 years ago
null_resource.deploy_anthos_cluster (remote-exec): Creating Anthos Cluster. This will take about 20 minutes...
null_resource.deploy_anthos_cluster (remote-exec): Cluster Created!
null_resource.deploy_anthos_cluster: Creation complete after 5s [id=2553823925035852511]
null_resource.download_kube_config: Creating...
null_resource.download_kube_config: Provisioning with 'local-exec'...
null_resource.download_kube_config (local-exec): Executing: ["/bin/sh" "-c" "scp -i ~/.ssh/anthos-abm-blue-p17o3 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  root@139.178.86.49:/root/baremetal/bmctl-workspace/abm-blue-p17o3/abm-blue-p17o3-kubeconfig ."]
null_resource.download_kube_config (local-exec): Warning: Permanently added '139.178.86.49' (ECDSA) to the list of known hosts.
null_resource.download_kube_config (local-exec): scp: /root/baremetal/bmctl-workspace/abm-blue-p17o3/abm-blue-p17o3-kubeconfig: No such file or directory

null_resource.kube_vip_install_first_cp: Still creating... [10s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [20s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
joshpadilla commented 3 years ago

https://github.com/equinix/terraform-metal-anthos-on-baremetal/blob/cbcc4542acce2240d57b4de10cd5498acf60d768/main.tf#L237

When I login to host:

root@abm-blue-p17o3-cp-01:~# ll /root/baremetal/bmctl-workspace/abm-blue-p17o3/
total 24
drwxr-xr-x 3 root root 4096 Feb  5 18:49 ./
drwxr-xr-x 3 root root 4096 Feb  5 18:49 ../
-rw-r--r-- 1 root root 9683 Feb  5 18:49 abm-blue-p17o3.yaml
drwxr-xr-x 3 root root 4096 Feb  5 18:49 log/

There's no kubeconfig in that dir just abm-blue-p17o3.yaml

joshpadilla commented 3 years ago

Changing to

command = "scp -i ~/.ssh/${local.ssh_key_name} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  root@${metal_device.control_plane.0.access_public_ipv4}:/root/baremetal/bmctl-workspace/${local.cluster_name}/${local.cluster_name}.yaml ."

Got rid of the error, but did not solve hanging cluster creation process:

null_resource.kube_vip_install_first_cp: Still creating... [32m10s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [32m20s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [32m30s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
null_resource.kube_vip_install_first_cp: Still creating... [32m40s elapsed]
null_resource.kube_vip_install_first_cp (remote-exec): Waiting for '/etc/kubernetes/manifests' to be created...
joshpadilla commented 3 years ago

kubeconfig is never being created. It should not try and download it until after the cluster creation succeeds. Checking cluster creation log file. /root/baremetal/cluster_create.log

cluster_create.log has a single line error about GKE hub and gcloud auth login. But the gce vm I’m using has gcloud auth already, still looking at that

joshpadilla commented 3 years ago

gcloud auth application-default login

You are running on a Google Compute Engine virtual machine. The service credentials associated with this virtual machine will automatically be used by Application Default Credentials, so it is not necessary to use this command.

If you decide to proceed anyway, your user credentials may be visible to others with access to this virtual machine. Are you sure you want to authenticate with your personal account?

Do you want to continue (Y/n)? Go to the following link in your browser: Enter verification code:

Credentials saved to file: [~/.config/gcloud/application_default_credentials.json]

joshpadilla commented 3 years ago

Looks like this a requirement, so if you don't have a the file, ~/.config/gcloud/application_default_credentials.json, then the terraform will hang without error.

displague commented 3 years ago

https://github.com/equinix/terraform-metal-anthos-on-baremetal/issues/28#issuecomment-747605670

The README is overdue for some updates.

displague commented 3 years ago

Actually, we do have some text supporting this:

https://github.com/equinix/terraform-metal-anthos-on-baremetal#install-gcloud

displague commented 3 years ago

Do you ideas on how we can improve this, @joshpadilla ?