Closed miqayel-manvelyan closed 4 years ago
So can be one of a few things:
What image are you using for your containers?
What's the full dnsmasq
command line being run (ps aux | grep dnsmasq
)?
Did you apply all the suggested sysctls from production-setup.md?
Hi @stgraber Thanks for reply, Production-setup is applied.
We use dnsmasq which is inside LXD. dnsstub of local machine is disabled.
Here is the reply from ps aux | grep dnsmasq
root 17333 0.0 0.0 18960 1052 pts/887 S+ 20:50 0:00 grep --color=auto dnsmasq
lxd 19998 0.7 0.0 56608 3348 ? S 16:06 2:14 dnsmasq --strict-order --bind-interfaces --pid-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=172.16.0.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/lib/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/lib/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 172.16.0.2,172.16.15.254,1h -s lxd -S /lxd/ --conf-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.raw -u lxd
Ok, dnsmasq looks fine. Did you try manually running the DHCP client in an affected container and dumping DHCP traffic on the host side veth (name can be found in lxc info NAME
) and on the bridge itself?
Effectively trying to see where the traffic is dropped if anywhere, if it's not dropped, then that would suggest an issue with dnsmasq.
-X, --dhcp-lease-max=<number>
Limits dnsmasq to the specified maximum number of DHCP leases. The default is 1000. This limit is to prevent DoS attacks from hosts which create thousands of leases and use lots of memory in the dnsmasq process.
This sounds suspect :)
Can you try setting raw.dnsmasq
to dhcp-lease-max=4000
on your bridge, see if that takes care of the issue?
I'm not sure that we'd want to bump this out of the box. We could add another config option to control it though or just let those few that need to go past it, directly configure dnsmasq through raw.dnsmasq
.
Note that you're quite likely to immediately hit another limit though, while looking into this issue, I've confirmed that the expected limit for Linux bridges is 1024 interfaces, so to get past that, you'd need to move to openvswitch or use multiple bridges.
Closing this as it's the first time we have someone really hit this and not immediately also hit the bridge limit. We could add a knob but that knob would only be useful for an additional 24 containers in the most case.
If someone else hits this, please comment and we'll consider adding more logic, likely expose a config key and have it refuse to be set past 1023 unless the driver is also changed to openvswitch.
Required information
Issue description
After reaching 1000+ containers - the new ones don't receive local IP
lxc network show lxdbr0
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)dmesg
lxc info test --show-log
lxc config show test --expanded (test = N1001 container)
/var/log/lxd/lxd.log
Just info logs