canonical / microcloud

Automated private cloud based on LXD, Ceph and OVN
https://microcloud.is
GNU Affero General Public License v3.0
272 stars 40 forks source link

microcloud init: failed to bind to any multicast udp port #254

Open myr4htw opened 8 months ago

myr4htw commented 8 months ago

Hello, I try to set up MicroCloud with 3 virtual machines. Every machine has 2 network interfaces: one is assigned an IP address, the other is without IP address: 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:1a:4a:16:01:96 brd ff:ff:ff:ff:ff:ff altname enp0s3 3: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:1a:4a:16:01:98 brd ff:ff:ff:ff:ff:ff altname enp0s9 inet 10.216.50.200/24 brd 10.216.50.255 scope global noprefixroute ens9 valid_lft forever preferred_lft forever

What's going wrong? Kind regards Margit

roosterfish commented 8 months ago

Hi @myr4htw, can you please provide some more context and reproducer steps. At which point of the microcloud init process do you see this error?

myr4htw commented 8 months ago

Hi, I am now trying microcloud init with the following preseed file:

lookup_subnet: 10.216.50.0/24 systems:

But multicast seems to be ok!?

Please see the output of the commands netstat -gn and nc:

root@stl-s-microcl1:~# netstat -gn IPv6/IPv4-Gruppenmitgliedschaften Schnittstelle RefZäh Grupp


lo 1 224.0.0.251 lo 1 224.0.0.1 ens3 1 224.0.0.1 ens9 2 224.0.0.251 ens9 1 224.0.0.1 lo 1 ff02::fb lo 1 ff02::1 lo 1 ff01::1 ens3 1 ff02::1 ens3 1 ff01::1 ens9 2 ff02::fb ens9 1 ff02::1:fff8:8ae0 ens9 1 ff02::1 ens9 1 ff01::1 .... same result on second server (microcl2) and third server (microcl3)

Test connection to the second server (microcl2): root@stl-s-microcl1:~# nc -u -v 10.216.50.202 5353 Connection to 10.216.50.202 5353 port [udp/mdns] succeeded!

Test connection to the third server (microcl3): root@stl-s-microcl1:~# nc -u -v 10.216.50.204 5353 Connection to 10.216.50.204 5353 port [udp/mdns] succeeded!

Hope that helps.

Kind regards Margit

roosterfish commented 8 months ago

Make sure multicast is enabled on your network. MicroCloud uses mDNS for discovery and in your case tries to send to the multicast address 224.0.0.251:5353 to discover it's peers. See the comment about cloud providers here: https://canonical-microcloud.readthedocs-hosted.com/en/latest/explanation/initialisation/#automatic-server-detection.

There is also this issue which looks to be the same https://github.com/canonical/microcloud/issues/134.

myr4htw commented 7 months ago

Hi, problem is solved. It was a network issue We had to add a router for our internal network. Now setup is complete, but I can't ping the virtual router. Network config of our servers looks strange with the interfaces in state down.....

micro1@stl-s-microcl1:~$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:1a:4a:16:01:96 brd ff:ff:ff:ff:ff:ff altname enp0s3 inet 10.216.50.200/24 brd 10.216.50.255 scope global noprefixroute ens3 valid_lft forever preferred_lft forever inet6 fe80::e42:9822:698:6b55/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000 link/ether 00:1a:4a:16:01:98 brd ff:ff:ff:ff:ff:ff altname enp0s9 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 4a:e1:c9:bb:5a:79 brd ff:ff:ff:ff:ff:ff 5: lxdovn1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:1a:4a:16:01:98 brd ff:ff:ff:ff:ff:ff 6: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 6a:21:1e:63:93:6c brd ff:ff:ff:ff:ff:ff 7: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether ba:e9:58:06:65:cf brd ff:ff:ff:ff:ff:ff inet6 fe80::7857:e1ff:fe4a:618b/64 scope link valid_lft forever preferred_lft forever

Do you have any ideas?

Kind regards Margit

roosterfish commented 7 months ago

I guess you are trying to ping the OVN virtual router from one of the LXD networks e.g. default? From the MicroCloud cluster nodes there is no route to this network. You can confirm this by getting the output of ip r on one of the cluster nodes and check that the LXD networks range (ipv4.address) from lxc network show default isn't in there.

Can you reach the virtual router/gateway from an instance connected to the respective LXD network?

myr4htw commented 7 months ago

Hi, unfortunately I don't understand your statement :-( Please see my config. Does this look correct?

Kind regards Margit micro1@stl-s-microcl1:/etc/default$ ip r default via 10.216.50.152 dev ens3 proto static metric 20100 10.216.50.0/24 dev ens3 proto kernel scope link src 10.216.50.200 metric 100 169.254.0.0/16 dev ens3 scope link metric 1000 micro1@stl-s-microcl1:/etc/default$ lxc network show default config: bridge.mtu: "1442" ipv4.address: 10.183.27.1/24 ipv4.nat: "true" ipv6.address: fd42:a418:68ab:5d4d::1/64 ipv6.nat: "true" network: UPLINK volatile.network.ipv4.address: 134.96.216.206 description: "" name: default type: ovn used_by:

myr4htw commented 7 months ago

Could you please give me a description how I can test the network and how to see if it is ok or not? Is there any documentation describing the relations?

roosterfish commented 7 months ago

In the output of ip r you can see there is no route to reach the virtual network 10.183.27.0/24 (default) in which 10.183.27.1/24 is the gateway as seen from LXD instances. Egress traffic from this network has the source IP 134.96.216.206 which comes from the range of addresses that you have specified during MicroCloud installation.

Now if you want to ping the public facing side of the virtual network you could try to ping 134.96.216.206. Make sure the network this address resides in is properly routed in your infrastructure.

From your messages I still don't see the exact error you are facing, please elaborate on this.

myr4htw commented 7 months ago

Hi,

134.96.216.206 is not reachable by ping. I think the main problem is that the MicroCloud is not reachable - neither internally nor from the internet. I will try to tell you the steps we have taken to build the MicroCloud. Perhaps you then see a mistake.

We've got 3 ubuntu servers. These are virtual machines in our RedHat Virtualization. Every server has 2 network interfaces. One interface (ens3) has an IP (10.216.50.200, 10.216.50.202, 10.216.50.204). The second interface has no IP (ens9), it is connected to the network 134.96.216.0 which is our network with internet connection.

micro1@stl-s-microcl1:/etc/default$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:1a:4a:16:01:96 brd ff:ff:ff:ff:ff:ff altname enp0s3 inet 10.216.50.200/24 brd 10.216.50.255 scope global noprefixroute ens3 valid_lft forever preferred_lft forever inet6 fe80::e42:9822:698:6b55/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000 link/ether 00:1a:4a:16:01:98 brd ff:ff:ff:ff:ff:ff altname enp0s9 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 4a:e1:c9:bb:5a:79 brd ff:ff:ff:ff:ff:ff 5: lxdovn1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:1a:4a:16:01:98 brd ff:ff:ff:ff:ff:ff 6: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 6a:21:1e:63:93:6c brd ff:ff:ff:ff:ff:ff 7: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether ba:e9:58:06:65:cf brd ff:ff:ff:ff:ff:ff inet6 fe80::7857:e1ff:fe4a:618b/64 scope link valid_lft forever preferred_lft forever

In the MicroCloud init process we selected the addresses 10.216.50.200 - 204 for internal traffic and the interfaces ens9 for external connectivity. IPv4 gateway: 134.96.216.200/24 first IP address in the range... 134.96.216.206 last IP address in the range... 134.96.216.208

Then the cluster was built successfully. Perhaps these informations tell you what is going wrong or what I still have to do to get a working MicroCloud.

Kind regards Margit

roosterfish commented 7 months ago

Okay so from what I can see this configuration looks ok.

Maybe let's first check it the other way around. When you deploy a LXD instance within the MicroCloud, can it reach the internet and/or gateway?

lxc launch ubuntu:jammy c1
lxc exec c1 -- bash -c "apt update && apt install -y traceroute && traceroute 1.1.1.1"

This should print something like this:

traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
 1  _gateway ({gateway of your LXD default network})  0.787 ms  0.822 ms  0.839 ms
 2  _gateway.lxd (134.96.216.200)  2.163 ms  2.164 ms  2.161 ms
...
10  one.one.one.one (1.1.1.1)  28.644 ms  23.253 ms  23.247 ms
myr4htw commented 7 months ago

Hi, creation of an instance doesn't work, please see:

micro1@stl-s-microcl1:~$ lxc launch ubuntu:jammy c1 Creating c1 Error: Failed instance creation: Failed getting image: Failed parsing stream: Get "https://cloud-images.ubuntu.com/releases/streams/v1/index.json": lookup cloud-images.ubuntu.com on 127.0.0.53:53: server misbehaving

roosterfish commented 7 months ago

This looks to be an issue related to domain name resolution on your machine and not LXD. What happens when you nslookup cloud-images.ubuntu.com?

myr4htw commented 7 months ago

root@stl-s-microcl1:~# nslookup cloud-images.ubuntu.com Server: 127.0.0.53 Address: 127.0.0.53#53

** server can't find cloud-images.ubuntu.com: SERVFAIL --> ok, there is no nameserver on localhost. And when I add our nameserver (e.g. 134.96.216.214) to /etc/resolv.conf it cannot be contacted ---> no internet connection! --> I am moving in a circle .... ???

myr4htw commented 7 months ago

Hello, perhaps I can ask my question like that: Which network / routing requirements must be met in order MicroCloud to work? We' d like to use network 10.216.50.0/24 (internal network without internet connection) for internal communication and addresses from network 134.96.216.0/24 (this network has internet connection) for the uplink network.

Kind regards Margit

roosterfish commented 7 months ago

You can find the networking requirements listed on this page https://canonical-microcloud.readthedocs-hosted.com/en/latest/explanation/microcloud/#explanation-networking.

Have you managed to get DNS working?

myr4htw commented 7 months ago

No, the problem is the lack of internet connection...

roosterfish commented 7 months ago

When launching an instance, downloading the image would be done through 10.216.50.0/24 and not the OVN network. Can you nslookup cloud-images.ubuntu.com from the host where LXD/MicroCloud is running on?

myr4htw commented 7 months ago

No, I can't. I have no internet connection from network 10.216.50.1 - I think that is the main problem. I thought the microcloud-init process would establish the internet connection via OVN-router?

roosterfish commented 7 months ago

That seems to be a problem yes. Please check the link I have posted earlier. For the creation of the MicroCloud/LXD cluster and the download of images the first network interface is used.

The second network interface is dedicated for connecting OVN to the uplink network. That is egress traffic originating from LXD instances deployed in MicroCloud.

myr4htw commented 7 months ago

The network the first interfaces belongs to has no internet connection. My understanding was it is only used for the communication between the MicroCloud Servers. That is why we chose this internal network. What does it mean now? What do we have to do to solve this problem?