openvswitch and routes not working after restart on Rocky-Linux 8.5

Tokix commented 2 years ago

Description With a bit of modification of the kubeinit files I am able to get okd deployed on rocky-linux 8.5. You can see the modifications here: https://github.com/Tokix/kubeinit I could make a pull request but there is one thing that is not working as expected and that is the restart of the server. After the restart the routes are vanished and I'm not able to reach the frontend anymore.

To Reproduce Steps to reproduce the behavior:

Install a Redhat 8.5 machine setup ssh connection as nyctea as described in the manual
In my case I had to install python on the hypervisor_host machine addionally before running the playbook successfully

yum install python3

Clone the changes for Rocky8.5

git clone https://github.com/Tokix/kubeinit.git

Run the playbook


ansible-playbook \
-v --user root \
-e kubeinit_spec=okd-libvirt-3-1-1 \
-i ./kubeinit/inventory \
./kubeinit/playbook.yml

5. Enable the frontend

ssh root@nyctea chmod +x create-external-ingress.sh ./create-external-ingress.sh


6. Setup the DNS Entries for your system
7. check if the url is working (it works at this point):

https://console-openshift-console.apps.okdcluster.kubeinit.local/

8. reboot the server

`init 6` 

9. The URL is not working any longer:

https://console-openshift-console.apps.okdcluster.kubeinit.local/

**Expected behavior**
The external url of the cluster should be available on restart and the routes should be set.

**Screenshots**
Working route-configuration before the restart: 

![image](https://user-images.githubusercontent.com/3341617/154397492-2a2bdf4e-8594-4fec-8bb4-76da570d86f4.png)

Route configuration after restart:

![image](https://user-images.githubusercontent.com/3341617/154397419-95f5c81c-e15d-45ce-a3c4-95afb9e86706.png)

**Infrastructure**
 - Hypervisors OS: Rocky-Linux
 - Version 8.5

**Deployment command**

ansible-playbook \ -v --user root \ -e kubeinit_spec=okd-libvirt-3-1-1 \ -i ./kubeinit/inventory \ ./kubeinit/playbook.yml



**Inventory file diff**

I did no changes to the inventory file 

**Additional context**

As selinux is active on rocky-linux 8.5 my first thought was that some changes could not be persisted so I disabled selinux for testing. However it is still not running after restart.

Checked this old issue https://forums.opensuse.org/showthread.php/530879-openvswitch-loses-configuration-on-reboot but it seems that the booting order of openvswitch and network.service is fine. 

Furthermore I ran the steps "Attach our cluster network to the logical router" in the file kubeinit/roles/kubeinit_libvirt/tasks/create_network.yml - This got me back to the correct routing table but I'm still not able to reach the guest-systems via 10.0.0.1-x

Is there any script or service that needs or can be re-run to enable the networking after reboot? 
In any case I'm thankful for any hints let me know if you need more information. 

Thank you in any case for the great project :)

jeffabailey commented 2 years ago

I'm also running into a problem with Rocky Linux.

Any help is welcome, this is a cool project, I hope we can get it working on Rocky.

TASK [kubeinit.kubeinit.kubeinit_prepare : Create ssh config file from template] *******************************************************************************
task path: /home/jeff/.ansible/collections/ansible_collections/kubeinit/kubeinit/roles/kubeinit_prepare/tasks/create_host_ssh_config.yml:52
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: jeff
<127.0.0.1> EXEC /bin/sh -c 'echo ~jeff && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/jeff/.ansible/tmp `"&& mkdir "` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" && echo ansible-tmp-1650765476.136161-198434-213715344205742="` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" ) && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1117, in do_template
    res = j2_concat(rf)
  File "<template>", line 47, in root
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/jinja2/runtime.py", line 903, in _fail_with_undefined_error
    raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run
    resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1154, in do_template
    raise AnsibleUndefinedVariable(e)
ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'"
}

PLAY RECAP *****************************************************************************************************************************************************
localhost                  : ok=48   changed=7    unreachable=0    failed=1    skipped=25   rescued=0    ignored=0

jeffabailey commented 2 years ago

My issue isn't specific to Rocky, so I'll add a new issue.

I ran into the same error using Debian.

Edit (Issue added): https://github.com/Kubeinit/kubeinit/issues/647

ccamacho commented 1 year ago

Maybe there are some IPtables rules not persisted after rebooting and I dont have a way to test this on Rocky.

logeshwaris commented 1 year ago

Hi @ccamacho,

Thanks for the awesome project. 👍

I am also running into same issue. After reboot, I am not able to reach 10.0.0.x. Is there a way where we can re enable the networking after reboot?

tschuyebuhl commented 1 year ago

I've got two servers, one with alma 8.x (which also seems to lose connectivity after reboot), and one with centos stream. I could help with providing some debug data, I can sacrifice my currently running clusters if need be.

tschuyebuhl commented 1 year ago

Okay, so the one with CentOS 8 and vanilla k8s didn't persist after restart. The VM's launched fine, but there was no networking. Also, the service pod only had one IP address, from the 10.89.x.x subnet.

Kubeinit / kubeinit

openvswitch and routes not working after restart on Rocky-Linux 8.5 #599