RedHat-EMEA-SSA-Team / hetzner-ocp4

Installing OCP 4 on single bare metal server.
Apache License 2.0
183 stars 114 forks source link

Installation cannot be finished #283

Closed mstrahinic01 closed 1 year ago

mstrahinic01 commented 1 year ago

Hi,

My installation got to this part:

TASK [openshift-4-cluster : Info message] **************************************
ok: [host] => {
    "msg": [
        "If you like to follow the installation run 'tail -f  /home/masinst/hetzner-ocp4/ansible/../ocp4/.openshift_install.log' in a second terminal.",
        "For more details, connect to the bootstrap node: ssh -l core 192.168.50.2"
    ]
}

TASK [openshift-4-cluster : Waiting for bootstrap to complete] *****************

On my server, I see this message:

[root@CentOS-80-stream-amd64-base ~]# tail -f  /home/masinst/hetzner-ocp4/ansible/../ocp4/.openshift_install.log
time="2023-06-07T12:43:04+02:00" level=debug msg="  Loading Pull Secret..."
time="2023-06-07T12:43:04+02:00" level=debug msg="  Loading Platform..."
time="2023-06-07T12:43:04+02:00" level=debug msg="Using Install Config loaded from state file"
time="2023-06-07T12:43:04+02:00" level=info msg="Waiting up to 30m0s (until 1:13PM) for bootstrapping to complete..."
time="2023-06-07T12:53:47+02:00" level=debug msg="Bootstrap status: complete"
time="2023-06-07T12:53:47+02:00" level=info msg="It is now safe to remove the bootstrap resources"
time="2023-06-07T12:53:47+02:00" level=debug msg="Time elapsed per stage:"
time="2023-06-07T12:53:47+02:00" level=debug msg="Bootstrap Complete: 12m33s"
time="2023-06-07T12:53:47+02:00" level=debug msg="               API: 1m50s"
time="2023-06-07T12:53:47+02:00" level=info msg="Time elapsed: 12m33s"

This is as far as I can get with this. Is this considered to be a good installation or not?

rbo commented 1 year ago

mh strange, what is the output of:

export KUBECONFIG=/home/masinst/hetzner-ocp4/ocp4/auth/kubeconfig
oc get co,clusterversion,nodes
mstrahinic01 commented 1 year ago

This is the output of the second command. First command doesn't have any output.

NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE clusteroperator.config.openshift.io/authentication 4.10.0-0.okd-2022-07-09-073606 True False False 102m clusteroperator.config.openshift.io/baremetal 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/cloud-controller-manager 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/cloud-credential 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/cluster-autoscaler 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/config-operator 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/console 4.10.0-0.okd-2022-07-09-073606 True False False 3h58m clusteroperator.config.openshift.io/csi-snapshot-controller 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/dns 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/etcd 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/image-registry 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/ingress 4.10.0-0.okd-2022-07-09-073606 True False True 22h The "default" ingress controller reports Degraded=True: DegradedConditions: One or$clusteroperator.config.openshift.io/insights 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/kube-apiserver 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/kube-controller-manager 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/kube-scheduler 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/kube-storage-version-migrator 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/machine-api 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/machine-approver 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/machine-config 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/marketplace 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/monitoring 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/network 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/node-tuning 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/openshift-apiserver 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/openshift-controller-manager 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/openshift-samples 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/operator-lifecycle-manager 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/service-ca 4.10.0-0.okd-2022-07-09-073606 True False False 22h clusteroperator.config.openshift.io/storage 4.10.0-0.okd-2022-07-09-073606 True False False 22h

NAME VERSION AVAILABLE PROGRESSING SINCE STATUS clusterversion.config.openshift.io/version 4.10.0-0.okd-2022-07-09-073606 True False 22h Error while reconciling 4.10.0-0.okd-2022-07-09-073606: the cluster operator ingress is degraded

NAME STATUS ROLES AGE VERSION node/master-0 Ready master,worker 22h v1.23.5+3afdacb

On Wed, Jun 7, 2023 at 5:16 PM Robert Bohne @.***> wrote:

mh strange, what is the output of:

export KUBECONFIG=/home/masinst/hetzner-ocp4/ocp4/auth/kubeconfig oc get co,clusterversion,nodes

— Reply to this email directly, view it on GitHub https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/issues/283#issuecomment-1581040339, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAKYRAYT4HJUEF6EB2TFWXTXKCLOBANCNFSM6AAAAAAY5WYTGY . You are receiving this because you authored the thread.Message ID: @.***>

-- S poštovanjem,

Milan Strahinić

rbo commented 1 year ago

I don't have any experience with OKD

But it looks like the ingress operator has a problem.

mstrahinic01 commented 1 year ago

What I see after running ansible-navigator run -m stdout ./ansible/setup.yml is that in post-install.yml I have a problem.

This is the last line of code that was executed.

- name: Waiting for bootstrap to complete
  ansible.builtin.command: "/opt/openshift-install-{{ openshift_version }}/openshift-install wait-for bootstrap-complete --dir {{ openshift_install_dir }} --log-level debug"
  register: bootstrap_status
  retries: 60
  delay: 60
  until: bootstrap_status.rc == 0

After this I do not see any code execution.

rbo commented 1 year ago

You can check the logs at the bootstrap node. The playbooks should print some information on how to get the information:

For example:

TASK [openshift-4-cluster : Info message] **************************************
ok: [host] => {
    "msg": [
        "If you like to follow the installation run 'tail -f  /root/hetzner-ocp4/ansible/../tester/.openshift_install.log' in a second terminal.",
        "For more details, connect to the bootstrap node: ssh -l core 192.168.52.2"
    ]
}

TASK [openshift-4-cluster : Waiting for bootstrap to complete] *****************

It might be the case I have different IP addresses.

mstrahinic01 commented 1 year ago

I managed to solve this by manually shutting down the bootstrap node. After that I disabled almost all code in create.yml and some few lines at post-install.yml. When I started the installation again the bootstrap node was disabled and the other part of the post-install.yml script executed with no problem.

I used this command to gather logs tail -f /root/hetzner-ocp4/ansible/../tester/.openshift_install.log and I left it for few hours to see if the bootstrap will pass on its own but after few hours I solved the problem manually as I described.

On Tue, Jun 20, 2023 at 9:58 PM Robert Bohne @.***> wrote:

You can check the logs at the bootstrap node. The playbooks should print some information on how to get the information:

For example:

TASK [openshift-4-cluster : Info message] ** ok: [host] => { "msg": [ "If you like to follow the installation run 'tail -f /root/hetzner-ocp4/ansible/../tester/.openshift_install.log' in a second terminal.", "For more details, connect to the bootstrap node: ssh -l core 192.168.52.2" ] }

TASK [openshift-4-cluster : Waiting for bootstrap to complete] *****

It might be the case I have different IP addresses.

— Reply to this email directly, view it on GitHub https://github.com/RedHat-EMEA-SSA-Team/hetzner-ocp4/issues/283#issuecomment-1599417818, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAKYRA5RAXBXFNASTDCCGLLXMH6E7ANCNFSM6AAAAAAY5WYTGY . You are receiving this because you authored the thread.Message ID: @.***>

-- S poštovanjem,

Milan Strahinić

rbo commented 1 year ago

strange, I don't know. Just for information, you can also use the tags of the playbooks.

ansible-navigator run ./ansible/02-create-cluster.yml --list-tags
/usr/local/lib/python3.8/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
/usr/local/lib/python3.8/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
playbook: /root/hetzner-ocp4/ansible/02-create-cluster.yml
  play #1 (host): host  TAGS: []
      TASK TAGS: [add-ons, always, download-openshift-artifacts, ignition, lb, letsencrypt, network, post-install, post-install-add-ons, public_dns]

=> ansible-navigator run ./ansible/02-create-cluster.yml --tags post-install Should only run post-install steps.