cgruver / okd4-single-node-cluster

Building an OKD4 single node cluster with minimal resources
GNU General Public License v3.0
78 stars 36 forks source link

interface name mismatch #5

Open hgkamath opened 4 years ago

hgkamath commented 4 years ago

Description

fedoracoreos/okd ignores nic0 and creates a new connection "Wired Connection" As the interface does not get a DNS/IP address, one cannot login via ssh and needs to login via console.

Experimenting

My bash scripts are based on your scripts

Trying to run a okd4-snc inside virtualbox laptop -> windows-10 -> virtualbox (nested) hostnode fedora-33-> kvm/qemu bootstrap

FCOS=32.20200809.3.0 OKD_RELEASE=4.5.0-0.okd-2020-09-04-180756

During the install-boot the coreos-installer seems to fetch Fedora CoreOS 32.20200629.3.0 dracut-050-61.git20200529.fc32

Workaround

Had to do the following on 2nd boot (1st boot=virt-install) in anticipation of future reboot ssh failure

# set password to later login as core on console
ssh/login
sudo passwd core
set a password

3rd-Boot (cannot ssh until iface is fixed)

virsh console, login as core with password
Fedora CoreOS 32.20200629.3.0
Kernel 5.6.19-300.fc32.x86_64 on an x86_64 (ttyS0)

sudo dmesg -D

nmcli dev
DEVICE  TYPE      STATE         CONNECTION
enp1s0  ethernet  disconnected  --
lo      loopback  unmanaged     --

nmcli con show
NAME                UUID                                  TYPE      DEVICE
Wired connection 1  ea3f22f3-cb2a-339f-bed5-f7d5b9bd6086  ethernet  --
nic0                3e802ffc-c4c5-4bf7-8829-6863c206350e  ethernet  --

sudo nmcli conn modify uuid 3e802ffc-c4c5-4bf7-8829-6863c206350e connection.interface-name enp1s0

nmcli dev
DEVICE   TYPE      STATE      CONNECTION
enp1s0   ethernet  connected  nic0
docker0  bridge    connected  docker0
lo       loopback  unmanaged  --

[core@okd4-snc-bootstrap ~]$ nmcli conn show
NAME                UUID                                  TYPE      DEVICE
nic0                3e802ffc-c4c5-4bf7-8829-6863c206350e  ethernet  enp1s0
docker0             4f67584c-6f50-4cd8-8ada-ca27b3c2b8ca  bridge    docker0
Wired connection 1  ea3f22f3-cb2a-339f-bed5-f7d5b9bd6086  ethernet  --

sudo nmcli conn delete uuid ea3f22f3-cb2a-339f-bed5-f7d5b9bd6086

nmcli conn show
NAME     UUID                                  TYPE      DEVICE
nic0     3e802ffc-c4c5-4bf7-8829-6863c206350e  ethernet  enp1s0
docker0  4f67584c-6f50-4cd8-8ada-ca27b3c2b8ca  bridge    docker0

sudo dmesg -E

... bootstrap setup proceeds.

The masters node first boot (virt-install) : Fedora CoreOS 32.20200809.3.0 dracut-050-61.git20200529.fc32 2nd boot: Fedora CoreOS 32.20200809.3.0 dracut-050-61.git20200529.fc32

nb. This may be a trash bug, caused because for some reason, (bug in my script) was not fetching the right fcos image bootstrap installer iso and was being skipped and hence not recreated. After rebuilding the bootstrap machine: First boot virt-install booted with Fedora CoreOS 32.20200809.3.0 dracut-050-61.git20200529.fc32 Second boot booted with Fedora CoreOS 32.20200809.3.0 dracut-050-61.git20200529.fc32 , 5.7.12-200.fc32.x86_64 on an x86_64

the nic0 was assigned correctly

$ sudo nmcli conn show
NAME  UUID                                  TYPE      DEVICE
nic0  e06c4514-a5b4-4331-96a5-40f2ca22261b  ethernet  nic0
$ sudo nmcli dev
DEVICE  TYPE      STATE      CONNECTION
nic0    ethernet  connected  nic0
lo      loopback  unmanaged  --

However the same thing happens after the pivot Fedora CoreOS 32.20200629.3.0 Kernel 5.6.19-300.fc32.x86_64

so there is some problem

cgruver commented 4 years ago

Did you take a look at my latest commit to the master branch? It is working. I tested it yesterday.

There were some issues with a previous version and the latest FCOS 32.

hgkamath commented 4 years ago

yeah, I did an install attempt after seeing your commits and after hand-merging your changes into my scripts. I wonder why you chose the name nic0, because laptops systemd naming is a bit different. or perhaps in your device that was the default name that the network-manager was assigning. https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html

So, It could be because the name u 'nic0' chose is not the usual default.

It could be because the machine config pivotted image is still based on the older fc32.20200629.3.0 and they haven't moved to fc32.20200809.3.0 yet

It could be because my setup (laptop) is too under-resourced, and I shouldn't try to run a kubernetes cluster on it

It could be because even though vbox supports nested-virtualization, for some reason i am unable to allocate more than 1 core to kvm/virsh nested-vm. If even 2 cores are allocated, the kvm-guest kernel panics and also causes the kvm-host to freeze. Running the 1core vm, in journalctl I can see that hyperkube has write/read/connect timeouts, perhaps caused by delays in crio starting containers. the 6443 port disappearing for instance.

cgruver commented 4 years ago

nic0 was an arbitrary choice. The ifnames feature allows you to create fixed, predictable names for your interfaces. That way, I know what the device name is across all of my different hardware types. I don't have to hunt for eno1 vs. enps2... etc...

To run this single node cluster build, you still need some pretty beefy hardware. 4 vCPU and 32GB ram. It also helps if you have a fast SSD.

kongli commented 3 years ago

@hgkamath May I know how to login in console to bootstrap with username core after second reboot? my network can not start up, not sure how to fix that.

cgruver commented 3 years ago

There really isn't a way to log in other than ssh, which obviously doesn't work if the network is not available.

The more important problem is why the network config is not working.

Is this the startup after the initial FCOS install?