A NetworkNotFound error was seen in the FCOS OpenStack instance on VexxHost today. The error caused all server creation to fail in OpenStack, either in the kola-openstack job or locally through the CLI.
harness.go:1782: Cluster failed starting machines: waiting for instance to run:
Server reported ERROR status:
{500 2024-11-22 20:56:47 +0000 UTC Build of instance 8837696b-6172-4721-aa4a-729e576573d5 aborted: Failed to allocate the network(s), not rescheduling.
The issue seems to be that the private network we attach to the servers is not available, but the network looks fine in the CLI and the cloud console.
When creating a server using openstack server create --debug --network=private <other server creation arguments> , the following debug message can be seen:
RESP BODY: {"NeutronError": {"type": "NetworkNotFound", "message": "Network private could not be found.", "detail": ""}}
The instance fails to launch and the error shows that the private network cannot be found. However:
The private network exists and appears healthy (openstack network show private confirms this, as does the cloud console).
The network has a valid subnet and sufficient IP addresses.
Instances launched yesterday (2024-11-21) using this network were functional.
Additional Information
OpenStack region doesn't seem to make a difference
The failure was seen using the ca-ymq-1 region in OpenStack, but I also saw the error when I tried creating a server in ams1 as well.
Timing of failure
We saw a successful kola-openstack run on 2024-11-22 8:54 UTC, but then saw the following error in a kola-openstack run at 2024-11-22 8:13 UTC
We then started seeing these failures on all runs afterwards starting at 2024-11-22 17:52 UTC
Potentially there were some stability issues this morning that could have affected our host.
VexxHost Status seems green though: https://status.vexxhost.com/
Nova Compute Logs
I searched for similar instances of this failure and articles/forums point towards checking the Nova Compute Logs at /var/log/nova/nova-compute.log and running a command as root on the host to resolve the issue. However, we dont have access to the host resources.
Description
A
NetworkNotFound
error was seen in the FCOS OpenStack instance on VexxHost today. The error caused all server creation to fail in OpenStack, either in thekola-openstack
job or locally through the CLI.The issue seems to be that the
private
network we attach to the servers is not available, but the network looks fine in the CLI and the cloud console.When creating a server using
openstack server create --debug --network=private <other server creation arguments>
, the following debug message can be seen:The instance fails to launch and the error shows that the private network cannot be found. However:
openstack network show private
confirms this, as does the cloud console).Additional Information
OpenStack region doesn't seem to make a difference
The failure was seen using the
ca-ymq-1
region in OpenStack, but I also saw the error when I tried creating a server inams1
as well.Timing of failure
We saw a successful kola-openstack run on
2024-11-22 8:54 UTC
, but then saw the following error in a kola-openstack run at2024-11-22 8:13 UTC
We then started seeing these failures on all runs afterwards starting at
2024-11-22 17:52 UTC
Potentially there were some stability issues this morning that could have affected our host. VexxHost Status seems green though: https://status.vexxhost.com/
Nova Compute Logs
I searched for similar instances of this failure and articles/forums point towards checking the Nova Compute Logs at
/var/log/nova/nova-compute.log
and running a command as root on the host to resolve the issue. However, we dont have access to the host resources.