Closed njayakrishna closed 3 years ago
@njayakrishna Are you able to run firewall-cmd --reload
on nyctea manually? just checking that the hv wasn't already in an inconsistent state before running kubeinit.
Yes, here is the output: Error: COMMAND_FAILED: 'python-nftables' failed: internal:0:0-0: Error: Could not process rule: Operation not supported
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: Operation not supported
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: Operation not supported
internal:0:0-0: Error: Could not process rule: No such file or directory
internal:0:0-0: Error: Could not process rule: No such file or directory
1) The earlier code (Without the NAT fix) did not ask for firewalld to be restarted. I first got the error as below: TASK [../../roles/kubeinit_libvirt : Refresh firewalld services list to pick up ovn services] *** fatal: [localhost -> nyctea]: FAILED! => {"changed": true, "cmd": "firewall-cmd --reload\n", "delta": "0:00:00.138766", "end": "2021-07-22 08:37:27.172175", "msg": "non-zero return code", "rc": 252, "start": "2021-07-22 08:37:27.033409", "stderr": "FirewallD is not running", "stderr_lines": ["FirewallD is not running"], "stdout": "", "stdout_lines": []}
It is after this error that I enabled firewalld. The earlier version wasn't giving this error.
2) Is there a way to cleanup the earlier run states? I had updated my code after you NAT fix and re-run the code without any cleanup. Actually I searched but couldn't find the cleanup command.
Any workaround for this please let me know. Thanks.
Thanks, that helps. Yes, the tasks running the firewall-cmd were previously commented out and firewalld had been disabled. This was a security risk we were incurring to make the install easier and it was considered a bad practice for kubeinit to be disabling the firewall on CentOS HVs. The CentOS HVs I did my testing on all ran firewalld by default on first bringup after a default install. Had you intentionally disabled firewalld before using kubeinit or is it possible it was still disabled from when an older kubeinit was run on that system? I just want to make sure we are running on an HV that is in a consistent state and it sounds like we may have left your system in an inconsistent one.
We have been working on the cleanup to make it more robust and predictable. In the latest version it is located in the https://github.com/Kubeinit/kubeinit/tree/main/kubeinit/roles/kubeinit_libvirt/tasks folder (files with cleanup in their name)
Thanks for your reply. No I had not disabled firewalld intentionally. It was still disabled from when the older kubeinit code was run. It looks like my HV is now in inconsistent state. I am guessing it could happen to any one who are upgrading to new versions of KubeInit after running an older version. I will look at the cleanup code, but any other workaround to reset the HV's state would be helpful.
I would probably try systemctl restart firewalld
first, then using dnf to reinstall, or completely remove and install, firewalld to see if that clears things up.
Reinstallation for firewalld did not help. I was still getting the same error. I changed /etc/firewalld/firewalld.conf: FirewallBackend=iptables from previous value of nftables and the error went away.
After that I did run the playbook again and got a failure :
TASK [../../roles/kubeinit_libvirt : Remove repo before adding it] ***** ok: [localhost -> 10.0.0.100] => {"changed": false, "path": "/etc/yum.repos.d/kubernetes.repo", "state": "absent"}
TASK [../../roles/kubeinit_libvirt : Creating a repository file for Kubernetes] **** changed: [localhost -> 10.0.0.100] => {"changed": true, "dest": "/etc/yum.repos.d/kubernetes.repo", "gid": 0, "group": "root", "mode": "0644", "owner": "root", "size": 0, "state": "file", "uid": 0}
TASK [../../roles/kubeinit_libvirt : Adding repository details in Kubernetes repo file.] *** changed: [localhost -> 10.0.0.100] => {"changed": true, "msg": "Block inserted"}
TASK [../../roles/kubeinit_libvirt : Update packages] ** fatal: [localhost -> 10.0.0.100]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'devel_kubic_libcontainers_stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "rc": 1, "results": []}
PLAY RECAP *****
hypervisor-01 : ok=16 changed=5 unreachable=0 failed=0 skipped=6 rescued=0 ignored=1
localhost : ok=147 changed=45 unreachable=0 failed=1 skipped=33 rescued=0 ignored=0
This error was the reason why I updated the code and I am back again at the same point. I logged in to the service node:
[jayakrin@jayakrin kubeinit]$ ssh root@10.0.0.100 Activate the web console with: systemctl enable --now cockpit.socket
[root@okd-service-01 ~]# ip route default via 10.0.0.254 dev eth0 proto static metric 100 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.100 metric 100
[root@okd-service-01 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:47:94:58 brd ff:ff:ff:ff:ff:ff inet 10.0.0.100/24 brd 10.0.0.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe47:9458/64 scope link valid_lft forever preferred_lft forever [root@okd-service-01 ~]# ping 10.0.0.254 PING 10.0.0.254 (10.0.0.254) 56(84) bytes of data. 64 bytes from 10.0.0.254: icmp_seq=1 ttl=254 time=0.249 ms 64 bytes from 10.0.0.254: icmp_seq=2 ttl=254 time=0.236 ms ^C --- 10.0.0.254 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1044ms rtt min/avg/max/mdev = 0.236/0.242/0.249/0.016 ms
But this VM cannot download anything from internet.
I read this article: https://www.anstack.com/blog/2020/08/25/KubeInit-External-access-for-OpenShift-OKD-deployments-with-Libvirt.html Where it states: "The 10.0.0.0/24 network is defined as a Virtual Network Switch implementing both NAT and DCHP for any interface connected to the kimgtnet0 network."
From my HV, I dont kimgnet0 configured. Could this be an issue? Any additional step required to be performed before running the playbook?
Thanks
on the hypervisor jayakrin
does firewall-cmd --list-all
include this?
rich rules:
rule family="ipv4" source address="10.0.0.0/24" masquerade
and virsh net-list
should print
Name State Autostart Persistent
----------------------------------------------
default active yes yes
kimgtnet0 active yes yes
This is the output after the script ended.
[jayakrin@jayakrin kubeinit]$ sudo firewall-cmd --list-all public (active) target: default icmp-block-inversion: no interfaces: enp3s0 sources: services: cockpit dhcpv6-client ovn-central-firewall-service ovn-host-firewall-service ssh ports: protocols: forward: no masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: rule family="ipv4" source address="10.0.0.0/24" masquerade [jayakrin@jayakrin kubeinit]$ virsh net-list Name State Autostart Persistent
Sorry Please ignore my earlier message. I made a mistake while pasting. Here is the output you asked for:
[jayakrin@nyctea ~]$ sudo virsh net-list --all [sudo] password for jayakrin: Name State Autostart Persistent
default active no no kimgtnet0 active yes yes
[jayakrin@nyctea ~]$ firewall-cmd --list-all Authorization failed. Make sure polkit agent is running or run the application as superuser. [jayakrin@nyctea ~]$ sudo firewall-cmd --list-all public (active) target: default icmp-block-inversion: no interfaces: enp3s0 sources: services: cockpit dhcpv6-client ovn-central-firewall-service ovn-host-firewall-service ssh ports: protocols: forward: no masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: rule family="ipv4" source address="10.0.0.0/24" masquerade
On further debugging I found the root cause of the problem: The /etc/resolv.conf contents are not right in service node.
[jayakrin@nyctea ~]$ ssh root@10.0.0.100
[root@okd-service-01 ~]# cat /etc/resolv.conf
# Generated by NetworkManager
search kubeinit.local okdcluster.kubeinit.local
nameserver 1.1.1.1
But the /etc/resolv.conf of HV shows :
[jayakrin@nyctea ~]$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 128.251.10.125
nameserver 128.251.10.145
When I added the following:
nameserver 128.251.10.125
nameserver 128.251.10.145
to the /etc/resolv.conf of the service node manually, it started working fine. The node was able to download the yum repo and the packages.
Can you please let me know a fix for this?
The script proceeded further and I got a different error this time:
TASK [../../roles/kubeinit_okd : Assign a default pullsecret when we use a local registry and deploying OKD] ***********************************************************************
ok: [localhost -> 10.0.0.100] => {"ansible_facts": {"kubeinit_registry_pullsecret": " { \"auths\": {} }"}, "changed": false}
TASK [../../roles/kubeinit_okd : Debug kubeinit_registry_pullsecret after overriding it] *******************************************************************************************
ok: [localhost -> 10.0.0.100] => {
"kubeinit_registry_pullsecret": " { \"auths\": {} }"
}
TASK [Prepare podman] **************************************************************************************************************************************************************
TASK [../../roles/kubeinit_prepare : Install common requirements] ******************************************************************************************************************
changed: [localhost -> 10.0.0.100] => {"changed": true, "msg": "", "rc": 0, "results": ["Installed: skopeo-1:1.2.2-4.module_el8.5.0+733+9bb5dffa.x86_64"]}
TASK [../../roles/kubeinit_prepare : Check if kubeinit_common_docker_password path exists] *****************************************************************************************
skipping: [localhost] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
TASK [../../roles/kubeinit_prepare : Read docker password from file when the variable has the path] ********************************************************************************
skipping: [localhost] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
TASK [../../roles/kubeinit_prepare : Podman login to docker.io] ********************************************************************************************************************
skipping: [localhost] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
TASK [../../roles/kubeinit_prepare : Clear any reference to docker password] *******************************************************************************************************
skipping: [localhost] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
TASK [../../roles/kubeinit_services : Create a minimal podman pod for the service containers] **************************************************************************************
changed: [localhost -> 10.0.0.100] => {"actions": ["created okd-service-01-pod"], "attempts": 1, "changed": true, "pod": {"CgroupParent": "machine.slice", "CgroupPath": "machine.slice/machine-libpod_pod_90a13de5db623b46bdd6b3322eb32d5f50fa4a0c16d88f97b18b22cb90527c35.slice", "Containers": [{"Id": "abe4d00caec16d7f9491e05361e362ebe36a3bfe7e4a013abf962467c73c90a6", "Name": "90a13de5db62-infra", "State": "configured"}], "CreateCgroup": true, "CreateCommand": ["podman", "pod", "create", "--name", "okd-service-01-pod", "--network", "host", "--dns", "10.0.0.100", "--dns", "1.1.1.1", "--dns-search", "okdcluster.kubeinit.local"], "CreateInfra": true, "Created": "2021-07-24T15:58:21.061481093Z", "Hostname": "okd-service-01-pod", "Id": "90a13de5db623b46bdd6b3322eb32d5f50fa4a0c16d88f97b18b22cb90527c35", "InfraConfig": {"DNSOption": null, "DNSSearch": ["okdcluster.kubeinit.local"], "DNSServer": ["10.0.0.100", "1.1.1.1"], "HostAdd": null, "HostNetwork": true, "NetworkOptions": null, "Networks": null, "NoManageHosts": false, "NoManageResolvConf": false, "PortBindings": {}, "StaticIP": "", "StaticMAC": ""}, "InfraContainerID": "abe4d00caec16d7f9491e05361e362ebe36a3bfe7e4a013abf962467c73c90a6", "Name": "okd-service-01-pod", "NumContainers": 1, "SharedNamespaces": ["ipc", "net", "uts"], "State": "Created"}, "podman_actions": ["podman pod create --name okd-service-01-pod --network host --dns 10.0.0.100 --dns 1.1.1.1 --dns-search okdcluster.kubeinit.local"], "stderr": "", "stderr_lines": [], "stdout": "90a13de5db623b46bdd6b3322eb32d5f50fa4a0c16d88f97b18b22cb90527c35\n", "stdout_lines": ["90a13de5db623b46bdd6b3322eb32d5f50fa4a0c16d88f97b18b22cb90527c35"]}
TASK [../../roles/kubeinit_services : Prepare credentials for services] ************************************************************************************************************
included: /home/jayakrin/kubeinit/kubeinit/roles/kubeinit_services/tasks/prepare_credentials.yml for localhost
TASK [../../roles/kubeinit_services : Gather the package facts] ********************************************************************************************************************
ok: [localhost -> localhost] => {"ansible_facts": {"packages": {"NetworkManager": [{"arch": "x86_64", "epoch": 1, "name": "NetworkManager", "release": "0.z.2.20e3975fd2.el8", "source": "rpm", "version": "1.32.3"}], "NetworkManager-libnm": [{"arch": "x86_64", "epoch": 1, "name": "NetworkManager-libnm", "release": "0.z.2.20e3975fd2.el8", "source": "rpm", "version": "1.32.3"}], "NetworkManager-team": [{"arch": "x86_64", "epoch": 1, "name": "NetworkManager-team", "release": "0.z.2.20e3975fd2.el8", "source": "rpm", "version": "1.32.3"}], "NetworkManager-tui": [{"arch": "x86_64", "epoch": 1, "name": "NetworkManager-tui", "release": "0.z.2.20e3975fd2.el8", "source": "rpm", "version": "1.32.3"}], "PackageKit": [{"arch": "x86_64", "epoch": null, "name": "PackageKit", "release": "6.el8", "source": "rpm", "version": "1.1.12"}], "PackageKit-glib": [{"arch": "x86_64", "epoch": null, "name": "PackageKit-glib", "release": "6.el8", "source": "rpm", "version": "1.1.12"}], "abattis-cantarell-fonts": [{"arch": "noarch", "epoch": null, "name": "abattis-cantarell-fonts", "release": "6.el8", "source": "rpm", "version": "0.0.25"}], "acl": [{"arch": "x86_64", "epoch": null, "name": "acl", "release": "1.el8", "source": "rpm", "version": "2.2.53"}], "adcli": [{"arch": "x86_64", "epoch": null, "name": "adcli", "release": "12.el8", "source": "rpm", "version": "0.8.2"}], "adwaita-cursor-theme": [{"arch": "noarch", "epoch": null, "name": "adwaita-cursor-theme", "release": "2.el8", "source": "rpm", "version": "3.28.0"}], "adwaita-icon-theme": [{"arch": "noarch", "epoch": null, "name": "adwaita-icon-theme", "release": "2.el8", "source": "rpm", "version": "3.28.0"}], "alsa-lib": [{"arch": "x86_64", "epoch": null, "name": "alsa-lib", "release": "4.el8", "source": "rpm", "version": "1.2.5"}], "annobin": [{"arch": "x86_64", "epoch": null, "name": "annobin", "release": "1.el8.0.1", "source": "rpm", "version": "9.65"}], "at": [{"arch": "x86_64", "epoch": null, "name": "at", "release": "11.el8", "source": "rpm", "version": "3.1.20"}], "at-spi2-atk": [{"arch": "x86_64", "epoch": null, "name": "at-spi2-atk", "release": "1.el8", "source": "rpm", "version": "2.26.2"}], "at-spi2-core": [{"arch": "x86_64", "epoch": null, "name": "at-spi2-core", "release": "1.el8", "source": "rpm", "version": "2.28.0"}], "atk": [{"arch": "x86_64", "epoch": null, "name": "atk", "release": "1.el8", "source": "rpm", "version": "2.28.1"}], "attr": [{"arch":
<snip>
TASK [../../roles/kubeinit_services : Install podman if required] ******************************************************************************************************************
fatal: [localhost -> localhost]: FAILED! => {"changed": false, "msg": "This command has to be run under the root user.", "results": []}
PLAY RECAP *************************************************************************************************************************************************************************
hypervisor-01 : ok=16 changed=3 unreachable=0 failed=0 skipped=6 rescued=0 ignored=1
localhost : ok=166 changed=51 unreachable=0 failed=1 skipped=39 rescued=0 ignored=0
Can you please let me know what the issue could be here as well?
Thanks
it indicates that you don't have podman installed on the host you are running the ansible playbook on and that the local user doesn't have permissions to install it
@njayakrishna can you also check this, for the DNS issue?https://github.com/Kubeinit/kubeinit/issues/198
Good News, I finally got it working. Thanks a lot @gmarcy and @ccamacho for giving me pointers to fix the issues, very much appreciated !!
For the DNS issue, I set the env variable KUBEINIT_COMMON_DNS_PUBLIC to the required value found in /etc/resolv.conf of the HV. I wasn't sure of the podman failed to install, so I did a yum install of podman. The script then completed successfully.
[jayakrin@nyctea ~]$ ssh root@10.0.0.100
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Sun Jul 25 07:44:03 2021 from 172.16.0.254
[root@okd-service-01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master0 Ready master 3h40m v1.20.0+87cc9a4-1079
worker0 Ready worker 3h31m v1.20.0+87cc9a4-1079
worker1 Ready worker 3h31m v1.20.0+87cc9a4-1079
@njayakrishna its nice it worked for you but the deployment should be that bumpy :)
Please feel free to ask any questions you have, and if you find any issue also feel free to raise issues in the project's repo (https://www.github.com/kubeinit/kubeinit). You can jump in and ask in the slack channel https://kubernetes.slack.com/archives/C01FKK19T0B Also, it would be awesome if you can star the project to catch up with updates and new features.
test
kubeInit fails to install for okd installation:
Infrastructure
Deployment Command issued:
ansible-playbook --user root -v -i ./hosts/okd/inventory --become --become-user root ./playbooks/okd.yml
Error log: