Closed asimpleidea closed 2 years ago
Hi,
is the bridge interface preconfigured on the host machine?
Even if that is not the case, I will add a check for the bridge device to detect the error earlier.
Yes I followed different approaches, both with netplan
and by following the guide in example folder.
This is networkctl status -a
:
● 2: ens160
Link File: /lib/systemd/network/99-default.link
Network File: /run/systemd/network/10-netplan-ens160.networ
Type: ether
State: routable (configuring)
Path: pci-0000:03:00.0
Driver: vmxnet3
Vendor: VMware
Model: VMXNET3 Ethernet Controller
HW Address: 00:0c:29:19:84:1c (VMware, Inc.)
Address: 192.168.1.160
Gateway: 192.168.1.1
DNS: 8.8.8.8
[...]
● 4: br18
Link File: /lib/systemd/network/99-default.link
Network File: n/a
Type: ether
State: routable (unmanaged)
Driver: bridge
HW Address: ba:8e:02:dc:4f:b9
Address: 192.168.1.160
fe80::b88e:2ff:fedc:4fb9
Gateway: 192.168.1.1
and this is /etc/systemd/network/br0-static-ip.network
:
[Match]
Name=br18
[Network]
Address=192.168.1.160/24
Gateway=192.168.1.1
DNS=192.168.1.1 # Router's DNS
# DNS=8.8.8.8 # Additional DNS if required
Thanks for the help!
I'm not sure what is the correlation between these two interfaces (br18
and ens160
) as ens160
is created by VMWare and is not enslaved to the bridge interface.
First make sure that the created bridge is active and that it has been given an IP address (from the given code snippet it seems it is).
To me it seems that bridge device is misconfigured and as a consequence libvirt provider cannot gather IP addresses for virtual machines, but I may be wrong.
For example I would create my bridge interface using netplan as follows:
network:
version: 2
bridges:
br18:
interfaces:
- ens160
dhcp4: true
dhcp6: false
ethernets:
ens160: {}
Can you also provide network section from terraform.tfvars
?
One question, do I need to have dhcp4: true
in the bridge even though I am assigning static ips in network section?
With netplan I created the bridge like this:
network:
version: 2
renderer: networkd
ethernets:
ens33:
addresses: [ 192.168.1.26/24 ]
gateway4: 192.168.1.1
nameservers:
addresses:
- "192.168.1.1"
ens160:
dhcp4: false
dhcp6: false
bridges:
br18:
dhcp4: false
dhcp6: false
nameservers:
addresses:
- "192.168.1.1"
addresses: [ 192.168.1.160/24 ]
interfaces:
- ens160
Some relevant parts of terraform.tfvars
:
# Network mode (nat, route, bridge) #
network_mode = "bridge"
# Network CIDR (example: 192.168.113.0/24) #
network_cidr = "192.168.1.0/24"
# Network (virtual) bridge #
# Note: For network mode 'bridge', bridge on host needs to preconfigured (example: br0) #
network_bridge = "br18"
# Network gateway (example: 192.168.113.1) #
# Note: If not provided, it will be calculated as first host in network CIDR. #
# +-> first host of 192.168.113.0/24 is 192.168.113.1 #
#network_gateway = "192.168.113.1"
# Network DNS list (if empty, network gateway is set as a DNS) #
network_dns_list = [
"192.168.1.1",
"8.8.8.8"
]
# Other stuf...
master_nodes = [
{
id = 1
ip = "192.168.1.150"
mac = "52:54:00:00:00:10"
}
]
# Other stuf...
worker_nodes = [
{
id = 1
ip = "192.168.1.151"
mac = "52:54:00:00:00:11"
}
]
If dhcpv4: true
is needed even with static IPs then I will give it one more try that, but I am sure I am doing some other mistakes somewhere.
Thank you so much for your help @MusicDin.
You don't need to enable dhcp4
if you don't use it.
Otherwise, both configurations seem valid to me.
How long did you let the script run before you stopped it? If you stop the script too early, it may be that the qemu agent has not yet reported a received IP address. For example, you can sometimes see this when all VMs receive the IP address after exactly 2 minutes. For this reason I recommend you to let the script run until it terminates itself (max. 5 minutes).
Please let me know if this solves your problem or what error is reported at the end?
I always let it run, it terminates on its own after 5 minutes, and the error that I posted on first post appears.
Anyways, I think this has more something to do with the terraform libvirt-provider, I will try to follow some of the related issues on their repository (e.g. https://github.com/dmacvicar/terraform-provider-libvirt/issues/924) and will let you know in case. Thank you! :)
I was able to recreate this issue.
For example, I have my network configured as follows: CIDR (for LAN network): 10.10.0.0/20 GW (router's IP): 10.10.0.1
If I enter the following values when creating the cluster, the cluster gets successfully created:
# terraform.tfvars
network_mode = "bridge"
network_bridge = "br0"
network_cidr = "10.10.0.0/20"
network_gateway = "10.10.0.1" # In this case, GW can be omitted
...
master_nodes = [
{
id = 1
ip = "10.10.6.5"
}
]
...
worker_nodes = [
{
id = 1
ip = "10.10.6.6"
}
]
If I enter the wrong GW IP, the addresses are not retrieved and I get the same error message as you.
The same thing happens if the wrong network CIDR is specified. For example, if I enter network_cidr = "10.10.0.0/22"
, I again get the same error as you.
Can you verify that you enterd the correct CIDR and GW?
Will check this out asap, thank you! :)
Hi,
can you let me know if the above solved your problem? Thanks.
Hi @MusicDin, so sorry for not replying sooner. I double-checked everything and the values are indeed correct but still had the problem, but after what you wrote I am more convinced that the problem is more a misconfiguration of mine some where else rather than the script itself.
BTW, I have a proxy server and modified your scripts to inject proxy environment variables in the cluster appropriately, and so in nat
mode everything works fine. My guess -- but I may be wrong -- is that maybe the qemu agent cannot contact the node because the proxy, at that point, is not configured in the guest yet, and so communication to the host is blocked. Do you think this could be the case?
Anyways, I have reverted to using nat
mode as it is still acceptable for my use case for now :)
In general, I don't think proxy is a problem because if your bridge interface gets its own IP address, so should the virtual machines. This is just a guess though, as I've no idea how your network is implemented.
I'm still not able to reproduce the issue other than with incorrect values, so it seems to me that it needs further investigation on your end. If the NAT mode is sufficient for your needs, that should do for now.
Please let me know if you have any more questions or information about this problem.
One more question @asimpleidea - can you please tell me which hypervisor you're using and which OS image you're installing on the nodes?
I am using ESXi and if I remember correctly the images were Ubuntu 16.04, I may try another time with 20.04 though.
So to conclude, I agree with you that I will have to investigate further and will let you know if I have other news :) Thanks so much @MusicDin !
Thanks again for the provided information and opening the issue.
I'm close it for now, but fell free to reopen if you have something new.
Hi,
thank you so much for this project, it is really a life saver.
Recently I have been trying to create a bridged network and assign static IPs to all machines but I keep failing with message:
Basically it keeps waiting for ips even though they are assigned statically and times out after 5 minutes:
Does anyone know why this is happening?