Closed anderotxoa closed 4 years ago
Did you perform https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md#one-time-setup , looks like that might be the issue. since the libvirt not accepting the tcp connections.
Also make sure you have the default libvirt network created as well. When running snc again, remove any previously created crc- libvirt networks too.
Out of curiosity, what OS are you using? @anderotxoa
Did you perform https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md#one-time-setup , looks like that might be the issue. since the libvirt not accepting the tcp connections.
Hi, yes I followed the instructions carefully.
[root@snc snc]# cat /etc/libvirt/libvirtd.conf |grep -v "#"
listen_tls = 0
listen_tcp = 1
tcp_port = "16509"
auth_tcp = "none"
[root@snc snc]#
Also the right port looks like listenening:
[root@snc snc]# netstat -tna|grep LISTEN
tcp 0 0 0.0.0.0:16509 0.0.0.0: LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0: LISTEN
tcp 0 0 127.0.0.1:53 0.0.0.0: LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0: LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
*tcp6 0 0 :::16509 ::: LISTEN*
tcp6 0 0 :::111 ::: LISTEN
tcp6 0 0 :::22 ::: LISTEN
tcp6 0 0 ::1:25 ::: LISTEN
[root@snc snc]#
BTW, I just disabled both selinux & firewalld
Also make sure you have the default libvirt network created as well. When running snc again, remove any previously created crc- libvirt networks too.
Out of curiosity, what OS are you using? @anderotxoa
Which network should this one be? When I used it in the x86 box I did not take care of anything, the script created it all.
Regarding the OS it is RH7.8: [root@snc snc]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.8 (Maipo)
I had a problem with an already installed YQ, it has been fixed but still same error keeps appearing
I will include also the whole log in case anyone has the time to check it and provide any insight. snc.log
I also can see that in my x86 box I have the virtual interfaces created, but my guess is that they are created in later stages so it makes sense that they are still not present in the Power8 box. Currently installing again in the x86 box to see any difference... Power8 --> RH7.8 (latest available) x86 --> CentOS 8.x (latest available)
@anderotxoa thanks the log is helpful. So it's a P8 rhel 7.8 machine using release 4.5.7 from the latest mirror. Are you trying to install SNC inside of a VM or are you installing snc as a bare metal install?
What is the output of virsh net-list --all
? Make sure the default network is created like this:
[root@snc snc]# virsh net-list --all
Name State Autostart Persistent
----------------------------------------------------------
default active yes yes
Also I mentioned
remove any previously created crc- libvirt networks too.
Try something like this:
CONNECT="${CONNECT:=qemu:///system}"
for NET in $(virsh -c "${CONNECT}" net-list --all --name \| grep crc); do
run virsh -c "${CONNECT}" net-destroy "${NET}"
run virsh -c "${CONNECT}" net-undefine "${NET}"
done
@anderotxoa thanks the log is helpful. So it's a P8 rhel 7.8 machine using release 4.5.7 from the latest mirror. Are you trying to install SNC inside of a VM or are you installing snc as a bare metal install?
What is the output of
virsh net-list --all
? Make sure the default network is created like this:[root@snc snc] virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes
Also I mentioned
remove any previously created crc- libvirt networks too.
Try something like this:
CONNECT="${CONNECT:=qemu:///system}" for NET in $(virsh -c "${CONNECT}" net-list --all --name \| grep crc); do run virsh -c "${CONNECT}" net-destroy "${NET}" run virsh -c "${CONNECT}" net-undefine "${NET}" done
Hi @mtarsel
Im afraid it does not show any net related stuff, even in the net config I can only see eth0 and lo0
[root@snc ~] # virsh net-list --all Name State Autostart Persistent "---------------------------------------------------------"
[root@snc ~] ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether aa:3f:e8:2e:e1:02 brd ff:ff:ff:ff:ff:ff inet 10.1.3.15/16 brd 10.1.255.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::a83f:e8ff:fe2e:e102/64 scope link valid_lft forever preferred_lft forever [root@snc ~]#
UPDATE It looks like the libvirtd network was not properly created (not sure why). Anyway after creating it following those instructions: https://blog.programster.org/kvm-missing-default-network it looks like passing previous step where it failed... (it is still installing) In the meanwhile I would suggest two things:
will keep you posted
UPDATE2
Now it is timingout to create the bootstrap node. I guess this could be due to slow disks (I only have access to regulad HDDs). I may have to move to a minsky server with NVMe, what do u think?
I also have plenty of RAM so I could use a RAM disk of ... lets say 64GB to accelerate the process but not sure where I could mount it, any suggestion?
Log:
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: i/o timeout
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: connect: no route to host
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: i/o timeout
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: connect: no route to host
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: connect: no route to host
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: connect: no route to host
DEBUG Unable to connect to the server: dial tcp 192.168.126.11:6443: connect: no route to host
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG The connection to the server api-int.crc.testing:6443 was refused - did you specify the right host or port?
DEBUG Gather remote logs
DEBUG Collecting info from crc-fgkhh-master-0.crc.testing
DEBUG lost connection
EBUG ssh: connect to host crc-fgkhh-master-0.crc.testing port 22: No route to host
DEBUG Log bundle written to /var/home/core/log-bundle-20200901130126.tar.gz
INFO Bootstrap gather logs captured here "/root/snc/crc-tmp-install-data/log-bundle-20200901130126.tar.gz"
FATAL Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition
I also have plenty of RAM so I could use a RAM disk of ... lets say 64GB to accelerate the process but not sure where I could mount it, any suggestion?
The VM images are created in /var/lib/libvirt/openshift-images. Hard to tell what went wrong in the install process with just these logs..
I also have plenty of RAM so I could use a RAM disk of ... lets say 64GB to accelerate the process but not sure where I could mount it, any suggestion?
The VM images are created in /var/lib/libvirt/openshift-images. Hard to tell what went wrong in the install process with just these logs..
I just created a 64GB ramdisk in this place and I'm testing it now. Anyway, do you think it is possible (or where) to increase the 40m timeout for the bootstrap completion?
I just created a 64GB ramdisk in this place and I'm testing it now.
You need to mount your ramdisk at this location, not clear if this is what you did.
Anyway, do you think it is possible (or where) to increase the 40m timeout for the bootstrap completion?
I believe this was discussed before and rejected by the installer team.
@cfergeau yes, I tested it both in /var/lib/libvirt/openshift-images and in /var/lib/libvirt/images but it made no difference. It still fails in the bootstrap creation.
I also changed the SMT level from 8 to 1 to give more power to the installation threads (it is using usually only two).
I will attach both the on screen logs and the general log in case someone can have some time to help because I cannot find a reason (I only can see it still complains about the net , in this case the sdn...)
snc.log --> Look for ERROR to find the only strange thing I saw log-bundle-20200902092624.tar.gz
So it's a P8 rhel 7.8 machine using release 4.5.7 from the latest mirror.
You have to upgrade to RHEL 8. The rhcos images for 4.3 and onward are not compatible with rhel 7 on ppc64le for snc at this time. Please upgrade to RHEL 8 and this should work.
So it's a P8 rhel 7.8 machine using release 4.5.7 from the latest mirror.
You have to upgrade to RHEL 8. The rhcos images for 4.3 and onward are not compatible with rhel 7 on ppc64le for snc at this time. Please upgrade to RHEL 8 and this should work.
Hi @mtarsel This makes sense, I will reinstall with RH8 and report back if Im successful. Thanks for the tip!
Hi @mtarsel I reinstalled with RH8, but still same error keeps appearing.
I will attach the log in case anyone can figure out something, also I will attach the command walkthrough I followed to install it in case something is wrong.
OCP SNC install walkthrough.txt log-bundle-20200904142155.tar.gz
yum install libvirt-devel libvirt-daemon-kvm libvirt-client -y
systemctl enable --now libvirtd
Just to be sure, is qemu-kvm installed as well? Should be a dependency of libvirt-daemon-kvm, but one never knows ^^
# Add IP and hostname to /etc/hosts
echo "10.1.3.15 snc" >> /etc/hosts
Not sure this is needed
vi /etc/libvirt/libvirtd.conf
# listen_tls = 0
# listen_tcp = 1
# auth_tcp = "none"
# tcp_port = "16509"
They should not be commented out (no #
in front of these lines).
iptables -I INPUT -p tcp -s 192.168.126.0/24 -d 192.168.122.1 --dport 16509 -j ACCEPT -m comment --comment "Allow insecure libvirt clients"
No need for that if you are using firewalld. After the firewalld changes, I would check that virsh -c qemu+tcp://192.168.122.1/system list --all
returns with no errors.
echo server=/tt.testing/192.168.126.1 | sudo tee /etc/NetworkManager/dnsmasq.d/openshift.conf
This particular change is not needed, though hopefully it won't cause issues, just delete /etc/NetworkManager/dnsmasq.d/openshift.conf
and restart NetworkManager, sudo systemctl restart NetworkManager
Hi @cfergeau Thanks for the comments. Yes the '#' in the libvirtd.conf lines are "uncommented", they are there just as a reminder. Yes the QEMU is installed The output of your command shows this: [root@snc ~] virsh -c qemu+tcp://192.168.122.1/system list --all setlocale: No such file or directory Id Name State
1 crc-nlqwb-bootstrap running
[root@snc ~]
I will delete the file you mention and start all over again
Hi @cfergeau
No matter what I always get the same problem. I have upgraded to RH8, moved to SMT1 and even used a Ramdisk for /var/lib/libvirtd/openshift-images. I have seen that it wastes a huge amount of time after the first three lines of this log and then (after may be 30 mins) it shows the following errors:
(HUGE DELAY HERE)
E0908 13:22:41.480062 276012 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch v1.ConfigMap: Get https://api.crc.testing:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3897&timeoutSeconds=502&watch=true: dial tcp 192.168.126.10:6443: connect: connection refused E0908 13:22:44.608542 276012 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch v1.ConfigMap: Get https://api.crc.testing:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3897&timeoutSeconds=595&watch=true: dial tcp 192.168.126.11:6443: connect: no route to host E0908 13:22:47.726212 276012 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list v1.ConfigMap: Get https://api.crc.testing:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: dial tcp 192.168.126.10:6443: connect: connection refused E0908 13:22:50.868144 276012 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list v1.ConfigMap: Get https://api.crc.testing:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: no route to host E0908 13:22:53.972520 276012 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list v1.ConfigMap: Get https://api.crc.testing:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: dial tcp 192.168.126.10:6443: connect: connection refused I0908 13:23:07.108441 276012 trace.go:116] Trace[1462331844]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (started: 2020-09-08 13:22:54.972654663 +0200 CEST m=+1541.174443011) (total time: 12.13573127s): Trace[1462331844]: [12.13573127s] [12.13573127s] END E0908 13:23:07.108472 276012 reflector.go:153] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to list v1.ConfigMap: Get https://api.crc.testing:6443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dbootstrap&limit=500&resourceVersion=0: net/http: TLS handshake timeout
Problem found. While both KVM virtual machines appear as running, connecting to the console I can see they started booting but never finished. They are stuck somewhere.
This is the output I could see:
[root@snc snc]# virsh console crc-p6rcr-master-0 setlocale: No such file or directory Connected to domain crc-p6rcr-master-0 Escape character is ^] [ 186.020523] Processor 2 is stuck. [ 186.046023] systemd-udevd[581]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. [ 186.047537] systemd-udevd[524]: seq 1749 '/devices/pci0000:00/0000:00:01.0/virtio0' is taking a long time [ 186.047713] systemd-udevd[524]: seq 1792 '/devices/system/cpu/cpu5' is taking a long time [ 186.047840] systemd-udevd[524]: seq 1791 '/devices/system/cpu/cpu4' is taking a long time [ 186.047976] systemd-udevd[524]: seq 1790 '/devices/system/cpu/cpu3' is taking a long time [ 186.048102] systemd-udevd[524]: seq 1758 '/devices/pci0000:00/0000:00:03.0/virtio1' is taking a long time [ 186.048257] systemd-udevd[524]: seq 1789 '/devices/system/cpu/cpu2' is taking a long time [ 186.048382] systemd-udevd[524]: seq 1788 '/devices/system/cpu/cpu1' is taking a long time [ 186.048535] systemd-udevd[524]: seq 1787 '/devices/system/cpu/cpu0' is taking a long time [ 186.049720] systemd[1]: Starting udev Wait for Complete Device Initialization... Starting udev Wait for Complete Device Initialization... [ 186.298686] crypto_register_alg 'xts(aes)' = 0
the problem in above comment doesn't seem to be related to snc. I think this issue should be closed but I'll open a separate issue about the default network pre-req for snc on pcc64le.
Hi @mtarsel , yes I agree. Finally I did not find the error cause but it is clear to me that it happens when trying to run the VMs inside one LPAR. They hung. Baremetal must be used instead.
an issue related to nested virtualization?
an issue related to nested virtualization?
Exactly
Hi
Im trying to install snc in a POWER8 box with 8 cores and 128GB RAM. I have managed to install it with no issues in an x86 box but in the P8 box te installer stops with the following message:
DEBUG Initializing the backend...
DEBUG
DEBUG Initializing provider plugins...
DEBUG
DEBUG Terraform has been successfully initialized! DEBUG
DEBUG You may now begin working with Terraform. Try running "terraform plan" to see DEBUG any changes that are required for your infrastructure. All Terraform commands DEBUG should now work.
DEBUG
DEBUG If you ever set or change modules or backend configuration for Terraform, DEBUG rerun this command to reinitialize your working directory. If you forget, other DEBUG commands will detect it and remind you to do so if necessary.
ERROR Error: virError(Code=38, Domain=7, Message='unable to connect to server at '192.168.122.1:16509': Connection timed out') ERROR
ERROR on ../../tmp/openshift-install-554439829/main.tf line 1, in provider "libvirt": ERROR 1: provider "libvirt" {
ERROR
ERROR
ERROR Failed to read tfstate: open /tmp/openshift-install-554439829/terraform.tfstate: no such file or directory FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change
From the command line I can see no KVM virtual machine has been created: virsh # list --all Id Name State
virsh #
It looks like the primary error to me but not sure how to follow the investigation...