IBM / Ansible-OpenShift-Provisioning

Automate the deployment of Red Hat OpenShift Container Platform on IBM zSystems (s390x). Automated User-Provisoned Infrastructure (UPI) setup using Kernel-based Virtual Machine (KVM).
https://ibm.github.io/Ansible-OpenShift-Provisioning/
MIT License
20 stars 42 forks source link

Named Service is failing during Bastion setup on X86 server. #268

Closed amrutp-redhat closed 5 months ago

amrutp-redhat commented 5 months ago

Try to install OCP cluster on x86 HOST..

Clone main branch and add the all.yaml & host_vars parameters and try to run 0_setup.yaml --> successful 4_create_bastion.yaml --> successful 5_setup_bastion.yaml --> Failed

5_setup_bastion.yaml playbook fails with below error.

TASK [dns : Restart named to update changes made to DNS] ***********************************************************************************
task path: /root/amrut_test/Ansible-OpenShift-Provisioning/roles/dns/tasks/main.yaml:121
fatal: [bastion]: FAILED! => {"changed": false, "msg": "Unable to restart service named: Job for named.service failed because the control process exited with error code.\nSee \"systemctl status named.service\" and \"journalctl -xe\" for details.\n"}

PLAY RECAP *********************************************************************************************************************************
127.0.0.1                  : ok=8    changed=4    unreachable=0    failed=0    skipped=18   rescued=0    ignored=0   
bastion                    : ok=27   changed=21   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0  

Checked named service status on bastion

[root@bastion etc]# systemctl status named
* named.service - Berkeley Internet Name Domain (DNS)
   Loaded: loaded (/usr/lib/systemd/system/named.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2024-04-12 06:15:49 EDT; 6min ago
  Process: 19378 ExecStop=/bin/sh -c /usr/sbin/rndc stop > /dev/null 2>&1 || /bin/kill -TERM $MAINPID (code=exited, status=0/SUCCESS)
  Process: 17616 ExecStart=/usr/sbin/named -u named -c ${NAMEDCONF} $OPTIONS (code=exited, status=0/SUCCESS)
  Process: 19390 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z "$NAMEDCONF"; else>
 Main PID: 17617 (code=exited, status=0/SUCCESS)

Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: zone 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN: loaded s>
Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: zone 1.0.0.127.in-addr.arpa/IN: loaded serial 0
Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: zone 0.in-addr.arpa/IN: loaded serial 0
Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: zone lnxero1.boe/IN: NS 'bastion.multiarch.lnxero1.boe' has no address records (A or AAAA)
Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: zone lnxero1.boe/IN: not loaded due to errors.
Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: _default/lnxero1.boe/IN: bad zone
Apr 12 06:15:49 bastion.lnxero1.boe bash[19392]: zone 235.23.172.in-addr.arpa/IN: loaded serial 2020011800
Apr 12 06:15:49 bastion.lnxero1.boe systemd[1]: named.service: Control process exited, code=exited status=1
Apr 12 06:15:49 bastion.lnxero1.boe systemd[1]: named.service: Failed with result 'exit-code'.
Apr 12 06:15:49 bastion.lnxero1.boe systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).

[root@bastion ~]# cat /var/named/multiarch.db

$TTL 86400
@ IN SOA bastion.multiarch.lnxero1.boe. admin.multiarch.lnxero1.boe.(
                                                2020021821 ;Serial
                                                3600 ;Refresh
                                                1800 ;Retry
                                                604800 ;Expire
                                                86400 ;Minimum TTL
)

;Name Server / Bastion Information
@ IN NS bastion.multiarch.lnxero1.boe.

;IP Address for Name Server
bastion IN A 172.23.235.230

;entry for bootstrap host.
bootstrap.multiarch.lnxero1.boe. IN A 172.23.235.231

;entries for the control nodes
master3.multiarch.lnxero1.boe. IN A 172.23.235.234
master2.multiarch.lnxero1.boe. IN A 172.23.235.233
master1.multiarch.lnxero1.boe. IN A 172.23.235.232

;entries for the compute nodes
worker2.multiarch.lnxero1.boe. IN A 172.23.235.236
worker1.multiarch.lnxero1.boe. IN A 172.23.235.235

;The api identifies the IP of your load balancer.
api.multiarch     IN    CNAME bastion.lnxero1.boe.
api-int.multiarch IN    CNAME bastion.lnxero1.boe.

;The wildcard also identifies the load balancer.
apps.multiarch  IN    CNAME bastion.lnxero1.boe.
*.apps.multiarch  IN    CNAME bastion.lnxero1.boe.

;EOF

Zones info

[root@bastion ~]# cd /etc/
[root@bastion etc]# cat named.rfc1912.zones 
// named.rfc1912.zones:
//
// Provided by Red Hat caching-nameserver package 
//
// ISC BIND named zone configuration for zones recommended by
// RFC 1912 section 4.1 : localhost TLDs and address zones
// and https://tools.ietf.org/html/rfc6303
// (c)2007 R W Franks
// 
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
// Note: empty-zones-enable yes; option is default.
// If private ranges should be forwarded, add 
// disable-empty-zone "."; into options
// 

zone "localhost.localdomain" IN {
        type master;
        file "named.localhost";
        allow-update { none; };
};

zone "localhost" IN {
        type master;
        file "named.localhost";
        allow-update { none; };
};

zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa" IN {
        type master;
        file "named.loopback";
        allow-update { none; };
};

zone "1.0.0.127.in-addr.arpa" IN {
        type master;
        file "named.loopback";
        allow-update { none; };
};

zone "0.in-addr.arpa" IN {
        type master;
        file "named.empty";
        allow-update { none; };
};
[root@bastion etc]# 

Bastion and cluster info in all.yaml

# Section 5 - Bastion
  bastion:
    create: True
    vm_name: solntest-bastion
    resources:
      disk_size: 100
      ram: 4096
      swap: 4096
      vcpu: 4
      vcpu_model_option: "--cpu host"
    networking:
      ip: 172.23.235.230
      #ipv6: #X
      #mac: #X
      hostname: bastion
      base_domain: multiarch.lnxero1.boe
      subnetmask: 255.255.0.0
      gateway: 172.23.0.1
      #ipv6_gateway: #X
      #ipv6_prefix: #X
      nameserver1: 172.23.0.1
#      nameserver2:
      forwarder: 1.1.1.1
      interface: #X
    access:
      user: root
      pass: #X
      root_pass: #X
    options:
      dns: True
      loadbalancer:
        on_bastion: True
#        public_ip:
#        private_ip:

# Section 6 - Cluster Networking
  cluster:
    networking:
      metadata_name: multiarch
      base_domain: lnxero1.boe
      subnetmask: 255.255.0.0
      gateway: 172.23.0.1
      #ipv6_gateway: #X
      #ipv6_prefix: #X
      nameserver1: 172.23.0.1
#      nameserver2:
      forwarder: 1.1.1.1
      interface: #X
amrutp-redhat commented 5 months ago

This is resolved by changing nameserver as bastion IP.