If VM Template of a storage node was altered to use custom Image(or clone of the original image) as a second mounted drive - the VM never finishes its configuration and not joining the k8s cluster.
The resolv.conf is different than on a healthy node:
RKE2 agent tries to query itself
The RKE2-agent configuration is also different than on a healthy node (the server set to: server: https://:9345)
Affected OneKE versions are both 1.29 and 1.27
Important note
Please note that the behaviour is going to differ whether second network (private) is isolated or not. But result is going to be the same - storage node is misconfigured and not joined the cluster!
If Private Network isolated:
The resolv.conf is missing on the damaged node, while it exists on a healthy node.
/var/log/one-appliance/configure.log is going to contain errors to communicate with OneGate Failed to open TCP connection to 172.16.100.1:5030 (Network is unreachable - connect(2) for "172.16.100.1" port 5030)
rke2-agent systemd service is going to be dead
If Private Network is routable (easiest way - hook both networks to the same Vnet):
resolv.conf is pointing to the network defined by the Vnet
no errors in the /var/log/one-appliance/configure.log - I, [2024-10-25T13:52:33.674020 #1449] INFO -- : Join storage: oneke-ip-172-16-100-4
RKE2-Agent errors: Oct 25 13:54:42 oneke-ip-172-16-100-4 rke2[1552]: time="2024-10-25T13:54:42Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:51308->127.0.0.1:64>
Steps to reproduce:
install miniONE.
Import OneKE 1.29 service.
Clone the default image that is used as a second disk for storage role nodes. (typically: Service OneKE 1.29-storage-2-*-1 and is 10G in size)
Change the Service OneKE 1.29-storage-2 VM Template to use the cloned Image as a second disk instead of a default one.
Instantiate service using the preferred method and make sure that k8s environment is running as expected (set enable traefik, longhorn, dns, route, NAT)
Scale the storage role by changing its cardinality to 1
Result:
The storage VM can't finish its configuration thus not added to the cluster.
Service stuck in Scaling state.
Workaround:
You can resize the disk after the VM is up and set the desired value.
Bug Info
Description:
If VM Template of a storage node was altered to use custom Image(or clone of the original image) as a second mounted drive - the VM never finishes its configuration and not joining the k8s cluster.
Some of the symptoms:
server: https://:9345
)Affected OneKE versions are both 1.29 and 1.27
Important note
Please note that the behaviour is going to differ whether second network (private) is isolated or not. But result is going to be the same - storage node is misconfigured and not joined the cluster!
If Private Network isolated:
/var/log/one-appliance/configure.log
is going to contain errors to communicate with OneGateFailed to open TCP connection to 172.16.100.1:5030 (Network is unreachable - connect(2) for "172.16.100.1" port 5030)
If Private Network is routable (easiest way - hook both networks to the same Vnet):
/var/log/one-appliance/configure.log
-I, [2024-10-25T13:52:33.674020 #1449] INFO -- : Join storage: oneke-ip-172-16-100-4
Oct 25 13:54:42 oneke-ip-172-16-100-4 rke2[1552]: time="2024-10-25T13:54:42Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:51308->127.0.0.1:64>
Steps to reproduce:
Service OneKE 1.29-storage-2-*-1
and is 10G in size)Result:
The storage VM can't finish its configuration thus not added to the cluster. Service stuck in Scaling state.
Workaround:
You can resize the disk after the VM is up and set the desired value.