Closed geodb27 closed 3 years ago
Hi, sorry to hear this.
Can you show me the output of lxc network show lxdbr0
please?
Sure, here it is :
config: dns.mode: managed dns.search: localdomain ipv4.address: 192.168.220.250/24 ipv4.nat: "true" ipv6.address: none raw.dnsmasq: | auth-zone=lxd dns-loop-detect description: "" name: lxdbr0 type: bridge used_by: (snip, contaier list) managed: true status: Created locations:
Can you try this:
sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- bash
LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ /snap/lxd/current/bin/dnsmasq --help
Usage: dnsmasq [options]
Valid options are:
-a, --listen-address=
This seems to work. At least, it don't raise any error. I get the help from the dnsmasq process.
@geodb27 does using the same approach allow you to start the dnsmasq process using the full command you originally posted?
this seems to work, provided I don't get out of the nsenter. I'm giving it a chance...
I've dropped the --keep-in-foreground and was able to launch dnsmasq, quit the namespace and launch all my containers. So, this solves the problem, but is not reliable at the moment... Anyway, thanks for your help, I wish this bug will be corrected soon.
@geodb27 @esosan if you remove "auth-zone" from raw.dnsmasq
using lxc network set lxdbr0 raw.dnsmasq
does dnsmasq then start for you?
Ofc kill the existing manually started process first if you have started it.
now the instances have an IP!!
OK so we're narrowing it down, its something to do with raw.dnsmasq
either because a specific setting in dnsmasq is preventing it from starting, or perhaps because when that option is used we have to disable the apparmor profile (as the user may reference resources outside of the allowed profile), and perhaps this is causing the issue.
btw, not all instances have acquired an ip4 (only the ip6), but after restarting the instance the right ip it's assigned
@esosan yes it probably needed the instances to be restarted to make a DHCP request again to get IPv4, whereas IPv6 are broadcasted using RA.
Found the problem. Its the auth-zone=lxd
setting:
dnsmasq: --auth-server required when an auth zone is defined.
I'll see if we can find a way to surface that error better.
Hello I can confirme this
Try unset the auth-zone line, it is what make dnsmasq block
Hello, I can confirm too. But after unsetting raw.dnsmasq
, I only get IPv6. I rebooted the host, but still, no IPv4.
It was working properly before I rebooted the host earlier today.
Hello, I can confirm too. But after unsetting
raw.dnsmasq
, I only get IPv6. I rebooted the host, but still, no IPv4.It was working properly before I rebooted the host earlier today.
Make sure its not a docker related firewall issue like https://discuss.linuxcontainers.org/t/containers-suddenly-stopped-working-no-more-ips-assigned/11360/19
The point would be to make the link between raw.dnsmasq, apparmor and the requirement for libnettle.so.7 ? As for me, my lxd clusters are setup on "pure" ubuntu-18.04 virtual machines. Docker is not involved at any point in the process. My setup was made for the dnsmasq respond to anything like instance.lxd and forward the rest to our dns servers. So, to have dnsmasq run after the nsenter, I had to add a server to the --auth-server parameter. Yet, this still doesn't explain the libnettle.so.7 thing.
Hello, I can confirm too. But after unsetting
raw.dnsmasq
, I only get IPv6. I rebooted the host, but still, no IPv4. It was working properly before I rebooted the host earlier today.Make sure its not a docker related firewall issue like https://discuss.linuxcontainers.org/t/containers-suddenly-stopped-working-no-more-ips-assigned/11360/19
Thanks, I use ufw and I had to ufw allow in on lxdbr0
. So all good now, but I don't get why it was working before I rebooted that box earlier today.
@loxK most likely because in modifying the lxdbr0 network's raw.dnsmasq
setting this would have caused LXD to remove and re-add its firewall rules, potentially changing the order of the rules in relation to another ruleset that is normally added after LXD has started.
See https://discuss.linuxcontainers.org/t/lxd-bridge-doesnt-work-with-ipv4-and-ufw-with-nftables/10034/17?u=tomp for a more thorough example.
The point would be to make the link between raw.dnsmasq, apparmor and the requirement for libnettle.so.7 ? As for me, my lxd clusters are setup on "pure" ubuntu-18.04 virtual machines. Docker is not involved at any point in the process. My setup was made for the dnsmasq respond to anything like instance.lxd and forward the rest to our dns servers. So, to have dnsmasq run after the nsenter, I had to add a server to the --auth-server parameter. Yet, this still doesn't explain the libnettle.so.7 thing.
I think the libnettle thing is not the issue here. The command originally run to get that error would have always failed because LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/
environment var was not set.
This should work:
LD_LIBRARY_PATH=/snap/lxd/current/lib/:/snap/lxd/current/lib/x86_64-linux-gnu/ sudo --preserve-env=LD_LIBRARY_PATH nsenter --mount=/run/snapd/ns/lxd.mnt -- <command>
The AppArmor thing doesn't appear to be related either, no evidence of that now that we have seen that it is possible to start dnsmasq with raw.dnsmasq
set as long as the auth-zone=lxd
is specified.
The issue appears to be that due to the LXD's snap package switch to core20
based package, this introduced a newer version of dnsmasq that had additional rules around when the auth-zone=lxd
setting can be used.
As dnsmasq error states:
dnsmasq: --auth-server required when an auth zone is defined.
This is one of the downsides of using raw.dnsmasq
setting, in that it doesn't get tested because the settings that are passed are used defined and unknown.
From a LXD perspective we just need to better surface these dnsmasq start up errors to aid in debugging future issues like this.
Some users are experiencing DHCPv4 issues after unsetting raw.dnsmasq
setting, but as they are getting IPv6 addresses, this shows dnsmasq is running and the original problem has been resolved. The cause of the DHCPv4 problem is likely a side effect caused when raw.dnsmasq
is removed or modified, which would cause LXD to clear its firewall rules and re-add them, potentially causing the rules to be added after additional external rules that would normally be added after LXD's rules, but in this case now come before LXD's rules and could then potentially block LXD's DHCP traffic.
If anyone is still experiencing firewall issues after fixing dnsmasq it may be due to the snap core20 change subtly affecting the cases where nftables would be used https://discuss.linuxcontainers.org/t/lxd-stopped-generating-firewall-rules-after-switch-to-core20/11367/9?u=tomp
Hi,
There might have been an update that broke the dnsmasq binary that is provided by the last lxd snap. The 'snap start lxd' emits the following error : lvl=eror msg="The dnsmasq process exited prematurely" driver=bridge err="Process exited with non-zero value 1" network=lxdbr0 project=default
I've dug further and here is what I did :
To my knoledge, but I admit that I don't know that much about snap, the problem resides that I have an ubuntu 18.04 that uses snap core18 and lxd-4.15 has been built against core20...
I don't know how to get lxd's dnsmasq run anymore. If anyone can help. Thanks a lot !