RedHat-EMEA-SSA-Team / hetzner-ocp4

Installing OCP 4 on single bare metal server.
Apache License 2.0
185 stars 118 forks source link

Air-gapped installed failed on RHEL 9 #294

Closed rbo closed 10 months ago

rbo commented 1 year ago

Followed: https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/blob/master/docs/air-gapped.md

Installed stucks, control plane nodes are not able to get ign config:

[  *** ] A start job is running for Ignition (fetch) (45min 14s / no limit)[ 2716.704281] ignition[761]: GET https://api-int.air-gapped.openshift.pub:22623/config/master: attempt #547
[ 2716.705490] ignition[761]: GET error: Get "https://api-int.air-gapped.openshift.pub:22623/config/master": dial tcp 192.168.55.1:22623: connect: no route to host

Looks like firewalld is not properly configured with a routed network:

$ virsh net-dumpxml air-gapped | grep -E '(forward|bridge)'
  <forward mode='route'/>
  <bridge name='virbr2' stp='on' delay='0'/>
$ firewall-cmd --get-zone-of-interface=virbr2
libvirt-routed
$ firewall-cmd --info-zone=libvirt-routed
libvirt-routed (active)
  target: default
  icmp-block-inversion: no
  interfaces: virbr2
  sources: 
  services: 
  ports: 
  protocols: 
  forward: no
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

Default libvirt zone, configured via ansible

$ firewall-cmd --info-zone=libvirt
libvirt (active)
  target: ACCEPT
  icmp-block-inversion: no
  interfaces: virbr0 virbr1
  sources: 
  services: dhcp dhcpv6 dns mountd nfs rpc-bind ssh tftp
  ports: 80/tcp 443/tcp 6443/tcp 22623/tcp 5000/tcp
  protocols: icmp ipv6-icmp tcp
  forward: no
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 
    rule priority="32767" reject
rbo commented 1 year ago

Tried work-a-round:

firewall-cmd --zone=libvirt-routed --permanent \
    --add-service dhcp \
    --add-service dhcpv6 \
    --add-service dns \
    --add-service mountd \
    --add-service nfs \
    --add-service rpc-bind \
    --add-service ssh \
    --add-service tftp

firewall-cmd --zone=libvirt-routed --permanent \
    --add-port 80/tcp \
    --add-port 443/tcp \
    --add-port 6443/tcp \
    --add-port 22623/tcp \
    --add-port 5000/tcp

firewall-cmd --zone=libvirt-routed --permanent \
    --add-protocol icmp \
    --add-protocol ipv6-icmp \
    --add-protocol tcp

firewall-cmd --reload

Curl from bootstrap to bootstrap works, but not bootstrap to lb:

[core@bootstrap ~]$ curl -k -I https://192.168.55.2:22623/config/master
HTTP/1.1 200 OK
Content-Length: 274114
Content-Type: application/json
Date: Fri, 21 Jul 2023 07:17:17 GMT

[core@bootstrap ~]$ curl -k -I https://192.168.55.1:22623/config/master
curl: (7) Failed to connect to 192.168.55.1 port 22623: No route to host
[core@bootstrap ~]$ 

Curl from host to bootstrap works and from host to lb:

# curl -k -I https://192.168.55.2:22623/config/master
HTTP/1.1 200 OK
Content-Length: 274114
Content-Type: application/json
Date: Fri, 21 Jul 2023 07:17:59 GMT

# curl -k -I https://192.168.55.1:22623/config/master
HTTP/1.1 200 OK
Content-Length: 274114
Content-Type: application/json
Date: Fri, 21 Jul 2023 07:18:02 GMT
rbo commented 1 year ago

libvirt-routed introduced with libvirt v8.10: https://libvirt.org/news.html#v8-10-0-2022-12-01

rbo commented 1 year ago

Disabled firewalld :-( - installation is goging forward

I don't know how to configure firewalld with routed... :-(

rbo commented 1 year ago

Master nodes are not able to access mirror-registry:

Jul 21 08:14:29 master-0 bash[1617]: Error: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cc6bb297405184369fc55480189ef447e1337fff0194abf3e7e1ebb9488aaa00: (Mirrors also failed: [host.compute.local:5000/update-tes
t/openshift/release@sha256:cc6bb297405184369fc55480189ef447e1337fff0194abf3e7e1ebb9488aaa00: error pinging docker registry host.compute.local:5000: Get "https://host.compute.local:5000/v2/": net/http: TLS handshake timeout]): quay.io/openshift-release-dev/
ocp-v4.0-art-dev@sha256:cc6bb297405184369fc55480189ef447e1337fff0194abf3e7e1ebb9488aaa00: error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp 18.214.152.215:443: i/o timeout
[root@master-0 ~]# curl -vv https://host.compute.local:5000/v2/
*   Trying 192.168.55.1...
* TCP_NODELAY set
* Connected to host.compute.local (192.168.55.1) port 5000 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):

Bootstrap as well, look like that happened because of disabling firewalld,. From host:"

# curl -vv https://host.compute.local:5000/v2/
*   Trying 192.168.55.1:5000...
* Connected to host.compute.local (192.168.55.1) port 5000 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/pki/tls/certs/ca-bundle.crt
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
^C

🤬 Reinstall with RHEL8... I don't have time to debug is in detail...

rbo commented 1 year ago

Tried without <forward mode='...'/> option. Did not worked... because nodes don't get a default gw, and thats needed. And I have to disable firewalld as well.

rbo commented 10 months ago

Fixed in devel will merge with PR #304 into master.