Closed liveaverage closed 2 years ago
@displague just let me know if I need to provide any extra detail around changes... figured I'd update while native IPI support progresses :)
I've made a change to keep Cloudflare as an optional DNS provider. The latest CloudFlare provider added api_key validation which was blocking other providers from being used.
@liveaverage I ran into the following:
│ Error: local-exec provisioner error
│
│ with null_resource.get_kubeconfig,
│ on main.tf line 162, in resource "null_resource" "get_kubeconfig":
│ 162: provisioner "local-exec" {
│
│ Error running command 'mkdir -p ./auth; scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /Users/marques/.ssh/id_rsa_mos-uqjlv root@145.40.102.129:/tmp/artifacts/install/auth/* ./auth/':
│ exit status 1. Output: Warning: Permanently added '145.40.102.129' (ECDSA) to the list of known hosts.
│ scp: /tmp/artifacts/install/auth/*: No such file or directory
│
╵
╷
│ Error: remote-exec provisioner error
│
│ with module.openshift_install.null_resource.check_port,
│ on modules/install/main.tf line 95, in resource "null_resource" "check_port":
│ 95: provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_523679698.sh": Process exited with status 1
Are you getting a different result?
Looks like my problem may be with the pull secret which is dated.
less /tmp/artifacts/install/.openshift_install.log
time="2021-11-10T00:36:09-05:00" level=fatal msg="failed to fetch Master Machines: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: pullSecret: Invalid value: \"{ \\\"kind\\\": \\\"Error\\\", \\\"id\\\": \\\"401\\\", \\\"href\\\": \\\"/api/accounts_mgmt/v1/errors/401\\\", \\\"code\\\": \\\"ACCOUNTS-MGMT-401\\\", \\\"reason\\\": \\\"Bearer token is malformed\\\" }\": auths required"
Good catch on api_key
validation. WRT to pull secret, yes, the bearer token may need to be refreshed, though it's not a frequent requirement.
I had some trouble accessing these nodes after the coreos reboots.
Are you referring to accessing nodes via SSH or OOB console (SOS)? The latter is a problem that existed in the previous automation, too. Kernel args don't persist, so dropping into SOS requires intercepting boot and adding the appropriate console param back in.
Let me retest with day 1 kernel arg updates documented here: https://github.com/openshift/installer/blob/master/docs/user/customization.md#nodes-with-custom-kernel-arguments -- technically we're supposed to make the initial kargs "sticky", but that's still not happening so I'll need to tweak MachineConfigs post-config generation and pre-installl. It can be modified as a day 2 activity as well, but easier for troubleshooting to do this day 1!