Closed Nani-o closed 6 years ago
I have just restarted the lxd service and the problem has gone away ...
Please tell me if I can still help debugging this with logs or anything, otherwise I don't think of any way for reproducing this so don't hesitate to close. I will make sure not restarting anything if this issue hits me again to help debugging it.
Would have been interesting to run tcpdump -ni lxdbr0
while the problem was happening to see if it was due to something like a traffic loop or some ARP issues, that'd be something to do should this happen again.
Can you attach your /var/log/kernel.log? Maybe there's still something in there from before the reboot which may help figure this out.
I'm closing this issue as LXD itself can't cause this behavior, all that LXD does with the network is create the bridge, create veth devices and plug them in. Once the containers are running, LXD isn't involved at all in the data path of packets.
This issue is therefore either something wrong going on with the containers (either maliciously or due to a configuration issue) which either causes a traffic loop, arp flood or arp poisoning or it's a kernel issue in either the bridge driver or the veth driver which causes the packet loss (hopefully in that latter case something may be logged in kernel.log).
Note that the issue being closed doesn't mean we won't be looking at logs and asking more questions to track this down, just that we don't think there's anything actionable in the LXD codebase itself for this issue.
Hi !
There is effectively something going on there in my kernel log :
$ grep 'Jul 26' /var/log/kern.log | wc -l
1067300
$ grep 'Jul 26' /var/log/kern.log | grep lxdbr0 | wc -l
1067223
$ grep 'Jul 26' /var/log/kern.log | grep lxdbr0 | tail -5
Jul 26 13:11:59 sd-107191 kernel: [1416925.737229] device lxdbr0 entered promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.739950] device lxdbr0 left promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.742396] device lxdbr0 entered promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.745885] device lxdbr0 left promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.982741] device lxdbr0 entered promiscuous mode
Here's the interesting part (just when I rebooted, there is nothing else after for this day) :
Jul 26 13:11:59 sd-107191 kernel: [1416925.729429] device lxdbr0 entered promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.734436] device lxdbr0 left promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.737229] device lxdbr0 entered promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.739950] device lxdbr0 left promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.742396] device lxdbr0 entered promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.745885] device lxdbr0 left promiscuous mode
Jul 26 13:11:59 sd-107191 kernel: [1416925.982741] device lxdbr0 entered promiscuous mode
Jul 26 13:12:02 sd-107191 kernel: [1416929.659348] cgroup: new mount options do not match the existing superblock, will be ignored
Jul 26 13:12:02 sd-107191 kernel: [1416929.659812] cgroup: new mount options do not match the existing superblock, will be ignored
Jul 26 13:12:05 sd-107191 kernel: [1416932.685193] audit: type=1400 audit(1532603525.965:1266): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-certbot_</var/snap/lxd/common/lxd>" pid=24099 comm="apparmor_parser"
Jul 26 13:12:06 sd-107191 kernel: [1416932.832706] device veth5U5BWV entered promiscuous mode
Jul 26 13:12:06 sd-107191 kernel: [1416932.834994] IPv6: ADDRCONF(NETDEV_UP): veth5U5BWV: link is not ready
Jul 26 13:12:06 sd-107191 kernel: [1416932.881483] eth0: renamed from veth1X04C7
Jul 26 13:12:06 sd-107191 kernel: [1416932.913211] IPv6: ADDRCONF(NETDEV_CHANGE): veth5U5BWV: link becomes ready
Jul 26 13:12:06 sd-107191 kernel: [1416933.608340] audit: type=1400 audit(1532603526.889:1267): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-hound_</var/snap/lxd/common/lxd>" pid=24180 comm="apparmor_parser"
Jul 26 13:12:06 sd-107191 kernel: [1416933.639974] audit: type=1400 audit(1532603526.921:1268): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-certbot_</var/snap/lxd/common/lxd>" name="/" pid=24187 comm="mount" flags="rw, remount"
Jul 26 13:12:07 sd-107191 kernel: [1416933.823469] device vethKRINQW entered promiscuous mode
Jul 26 13:12:07 sd-107191 kernel: [1416933.825778] IPv6: ADDRCONF(NETDEV_UP): vethKRINQW: link is not ready
Jul 26 13:12:07 sd-107191 kernel: [1416933.876633] eth0: renamed from veth00TQ9F
Jul 26 13:12:07 sd-107191 kernel: [1416933.897366] IPv6: ADDRCONF(NETDEV_CHANGE): vethKRINQW: link becomes ready
Jul 26 13:12:08 sd-107191 kernel: [1416934.764246] audit: type=1400 audit(1532603528.045:1269): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-lab-c6_</var/snap/lxd/common/lxd>" pid=24393 comm="apparmor_parser"
Jul 26 13:12:08 sd-107191 kernel: [1416935.173544] device veth4E3TOP entered promiscuous mode
Jul 26 13:12:08 sd-107191 kernel: [1416935.176053] IPv6: ADDRCONF(NETDEV_UP): veth4E3TOP: link is not ready
Jul 26 13:12:08 sd-107191 kernel: [1416935.232658] eth0: renamed from veth0R8A1U
Jul 26 13:12:08 sd-107191 kernel: [1416935.279721] IPv6: ADDRCONF(NETDEV_CHANGE): veth4E3TOP: link becomes ready
Jul 26 13:12:09 sd-107191 kernel: [1416935.944974] audit: type=1400 audit(1532603529.225:1270): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxd-lab-c6_</var/snap/lxd/common/lxd>" name="/proc/" pid=24702 comm="mount" flags="rw, nosuid, nodev, noexec, remount, relatime"
Jul 26 13:12:09 sd-107191 kernel: [1416936.046667] audit: type=1400 audit(1532603529.325:1271): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-lab-c7_</var/snap/lxd/common/lxd>" pid=24727 comm="apparmor_parser"
Jul 26 13:12:09 sd-107191 kernel: [1416936.421435] IPv6: ADDRCONF(NETDEV_CHANGE): vethGPSGE7: link becomes ready
Jul 26 13:12:10 sd-107191 kernel: [1416937.524422] device vethVBLU15 entered promiscuous mode
Jul 26 13:12:10 sd-107191 kernel: [1416937.527090] IPv6: ADDRCONF(NETDEV_UP): vethVBLU15: link is not ready
Jul 26 13:12:10 sd-107191 kernel: [1416937.585444] eth0: renamed from vethFGSDVR
Jul 26 13:12:10 sd-107191 kernel: [1416937.613512] IPv6: ADDRCONF(NETDEV_CHANGE): vethVBLU15: link becomes ready
Jul 26 13:12:12 sd-107191 kernel: [1416938.920474] device veth4RMQRA entered promiscuous mode
Jul 26 13:12:12 sd-107191 kernel: [1416938.923349] IPv6: ADDRCONF(NETDEV_UP): veth4RMQRA: link is not ready
Jul 26 13:12:12 sd-107191 kernel: [1416938.968755] eth0: renamed from vethSICCMO
Jul 26 13:12:12 sd-107191 kernel: [1416939.009935] IPv6: ADDRCONF(NETDEV_CHANGE): veth4RMQRA: link becomes ready
Jul 26 13:12:13 sd-107191 kernel: [1416940.304324] audit: type=1400 audit(1532603533.585:1275): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-lab-d9_</var/snap/lxd/common/lxd>" pid=25613 comm="apparmor_parser"
Jul 26 13:12:13 sd-107191 kernel: [1416940.703690] IPv6: ADDRCONF(NETDEV_UP): vethXDQLOF: link is not ready
Jul 26 13:12:14 sd-107191 kernel: [1416940.795915] IPv6: ADDRCONF(NETDEV_CHANGE): vethXDQLOF: link becomes ready
Jul 26 13:12:14 sd-107191 kernel: [1416940.918267] audit: type=1400 audit(1532603534.197:1278): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-lab-d7_</var/snap/lxd/common/lxd>" name="/run/" pid=25990 comm="mount" flags="rw, nosuid, noexec, remount, relatime"
Jul 26 13:12:14 sd-107191 kernel: [1416940.960560] audit: type=1400 audit(1532603534.241:1279): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-lab-d7_</var/snap/lxd/common/lxd>" name="/run/lock/" pid=26024 comm="mount" flags="rw, nosuid, nodev, noexec, remount, relatime"
Jul 26 13:12:14 sd-107191 kernel: [1416940.972799] audit: type=1400 audit(1532603534.253:1280): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxd-lab-d7_</var/snap/lxd/common/lxd>" name="/proc/" pid=26034 comm="mount" flags="rw, nosuid, nodev, noexec, remount, relatime"
Jul 26 13:12:15 sd-107191 kernel: [1416941.791473] audit: type=1400 audit(1532603535.073:1284): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-lab-u14_</var/snap/lxd/common/lxd>" pid=26658 comm="apparmor_parser"
Jul 26 13:12:15 sd-107191 kernel: [1416942.120747] device vethUMSDXC entered promiscuous mode
Jul 26 13:12:15 sd-107191 kernel: [1416942.188867] eth0: renamed from vethMMLWR3
Jul 26 13:12:15 sd-107191 kernel: [1416942.203896] IPv6: ADDRCONF(NETDEV_CHANGE): vethUMSDXC: link becomes ready
Jul 26 13:12:17 sd-107191 kernel: [1416943.771558] IPv6: ADDRCONF(NETDEV_UP): veth5RSTHU: link is not ready
Jul 26 13:12:17 sd-107191 kernel: [1416943.836870] eth0: renamed from vethPY0NJ6
Jul 26 13:12:19 sd-107191 kernel: [1416945.881652] audit_printk_skb: 36 callbacks suppressed
Jul 26 13:12:19 sd-107191 kernel: [1416945.881658] audit: type=1400 audit(1532603539.161:1297): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-lab-d9_</var/snap/lxd/common/lxd>" name="/" pid=28517 comm="(resolved)" flags="rw, rslave"
Jul 26 13:12:19 sd-107191 kernel: [1416946.293540] audit: type=1400 audit(1532603539.573:1299): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-lab-d9_</var/snap/lxd/common/lxd>" name="/" pid=28950 comm="(sh)" flags="rw, rslave"
Jul 26 13:12:20 sd-107191 kernel: [1416946.874322] IPv6: ADDRCONF(NETDEV_UP): vethTJCNR2: link is not ready
Jul 26 13:12:22 sd-107191 kernel: [1416948.937241] audit: type=1400 audit(1532603542.217:1300): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-nginx_</var/snap/lxd/common/lxd>" pid=29870 comm="apparmor_parser"
Jul 26 13:12:24 sd-107191 kernel: [1416951.193685] IPv6: ADDRCONF(NETDEV_CHANGE): veth16YDTF: link becomes ready
Jul 26 13:12:26 sd-107191 kernel: [1416952.997330] device vethAGS1AP entered promiscuous mode
Jul 26 13:12:28 sd-107191 kernel: [1416955.453383] device veth4HFKL2 entered promiscuous mode
Jul 26 13:12:29 sd-107191 kernel: [1416956.695256] audit: type=1400 audit(1532603549.972:1303): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-lab-u18_</var/snap/lxd/common/lxd>" name="/" pid=30807 comm="(networkd)" flags="rw, rslave"
Jul 26 13:12:31 sd-107191 kernel: [1416958.255861] audit: type=1400 audit(1532603551.536:1307): apparmor="STATUS" operation="profile_load" label="lxd-lab-u16_</var/snap/lxd/common/lxd>//&:lxd-lab-u16_<var-snap-lxd-common-lxd>://unconfined" name="/sbin/dhclient" pid=30796 comm="apparmor_parser"
Jul 26 13:12:31 sd-107191 kernel: [1416958.315796] audit: type=1400 audit(1532603551.596:1308): apparmor="STATUS" operation="profile_load" label="lxd-lab-u16_</var/snap/lxd/common/lxd>//&:lxd-lab-u16_<var-snap-lxd-common-lxd>://unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=30796 comm="apparmor_parser"
Jul 26 13:12:31 sd-107191 kernel: [1416958.318482] audit: type=1400 audit(1532603551.596:1310): apparmor="STATUS" operation="profile_load" label="lxd-lab-u16_</var/snap/lxd/common/lxd>//&:lxd-lab-u16_<var-snap-lxd-common-lxd>://unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=30796 comm="apparmor_parser"
Let me know if I can add more informations.
The in and out of promiscuous mode is a bit odd unless you were actively running a bunch of tcpdump or using other network monitoring software that would similarly need promiscuous access to the interfaces.
The rest looks about as expected assuming containers were actively started and stopped. It shows new veth devices being created, them joining the bridge, coming up and the container side of them getting renamed to eth0, followed by the container's apparmor profiles getting loaded and some of the apparmor rules getting hit during container startup.
I don’t think of anything running on my server that could have the need of promiscuous mode on an interface.
My kern.log is filled from Jul 24 17:50:18
to Jul 26 13:11:59
with in/out promiscuous mode messages. I have seen a lot of disturbance and problems accessing my services from the 25th of July. At that moment I have just restarted what I thought was failed containers and since what I wanted, worked, do not bothered to dig this further. This only yesterday when I saw similar problems that I began to search what could cause this.
The only thing I remember doing on my server these past days, was testing the networks endpoint of LXD API with an ansible module.
in/out of promisc mode is weird and not something that you should be seeing during normal operation, that is, unless you're running some network monitoring software, intrusion detection system or manual tcpdump/tshark/wireshark sessions that all depend on interface being (if only temporarily) in promisc mode
Does going In/out of promiscuous mode coincide with starting/stopping containers?
I had similar problems (with LXC, no LXD), this was caused by another container using the same IP address. Also make sure that you are using unique MAC addresses per interface.
I'm experiencing a lot (40%) packet loss on virtual nics plugged into my default bridge.
Required information
Issue description
I was experiencing a lot of disturbance using my services hosted on my LXD containers. After a long outage where pretty much nothing was working, I began investigating. What I've found is the network between my containers and my host (bridge lxdbr0) was loosing a lot of packets.
Steps to reproduce
It's gonna be difficult to give an exact way to reproduce since I don't know what causes this and have no clue what I've done that could create this behavior.
I can therefore give some informations that I have and update this with asked debug commands.
lxdbr0 bridge
example config of one of this containers
Here's the config of one of this containers :
packet loss
As you can see, packet loss is only happening on virtual nics that use the bridge :
Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)lxc monitor
while reproducing the issue)