enabling default vlan on bridge br0 failed open /sys/class/net/br0/bridge/default_pvid: permission denied

nicholaspearson commented 4 years ago

[x] This is a bug report
[ ] This is a feature request
[x] I searched existing issues before opening this one

Expected behavior

Container will start when a stack deploy command is ran.

Actual behavior

Container does not start, what seems to be due to the following error found in journal.

enabling default vlan on bridge br0 failed open /sys/class/net/br0/bridge/default_pvid: permission denied

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:37:22 2019
 OS/Arch:           linux/arm
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:31:17 2019
  OS/Arch:          linux/arm
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 6
  Running: 4
  Paused: 0
  Stopped: 2
 Images: 8
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: snyzpzpzk0z74og9hr5cov0ui
  Is Manager: true
  ClusterID: lwistwd4hsxtywd6nr9b1tcpg
  Managers: 2
  Nodes: 6
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 10.0.0.65
  Manager Addresses:
   10.0.0.65:2377
   10.0.0.69:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.75-v7+
 Operating System: Raspbian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: armv7l
 CPUs: 4
 Total Memory: 926.1MiB
 Name: swarm-leader-01.servers.eth1.uk
 ID: 3DFK:5TL4:BJMN:UWBY:FMHR:EXZW:PCH3:7ZNQ:N7H6:F3ME:5IMQ:LITD
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: Running Swarm in a two-manager configuration. This configuration provides
         no fault tolerance, and poses a high risk to lose control over the cluster.
         Refer to https://docs.docker.com/engine/swarm/admin_guide/ to configure the
         Swarm for fault-tolerance.

Additional environment details (AWS, VirtualBox, physical, etc.)

8 node Raspberry pi Docker Swarm cluster with 2 manager.

nicholaspearson commented 4 years ago

A few notes.

I had this problem prior to adding a second manager to the cluster.
This issue seems to only happen when using an overlay network.
The br0 interface directory simply does not exist

jjdiazgarcia commented 4 years ago

same problem here using 2 raspberry nodes (both are manager nodes)

nicholaspearson commented 4 years ago

Any response from repo maintainers on this?

Maikusan commented 4 years ago

Same issue using Raspberry 3 Model 8+. Any Solutions?

georgkrause commented 4 years ago

I ran into the same problem on a Raspberry Pi Model 3, does anyone found a workaround yet?

Edit: If there is any information you need to debug, let me know I am willing to help.

ghost commented 4 years ago

Same problem on an ASUSTOR (busybox) install.

claymore666 commented 3 years ago

I have the same error message in SYSLOG, running Docker 18.9.1 on armv7l (RPI4 in 32bit mode) in swarm mode with one master node only. However my container start normally and I can also access them from the outside network over the bridge. But I wonder where BR0 comes from, my Bridge runs under /dev/docker_gwbridge and not br0. Is that the same with you guys? Other than that I can confirm @nicholaspearson 's observation, the directory (and the device) do not exist, Hence the error message. I am using overlays as well.

rdfedor commented 3 years ago

Same issue here. Have a four node Raspberry PI 4 4GB ARM-64 cluster w/ one master node running 64-bit Raspberry OS (debian based). I'm seeing this behavior on worker node-1 and node-3. I set up Loki on the cluster to aggregate the various logs and found that when this happens the nodes seem to still work but just loses all network connectivity inbound and outbound. I lose access to SSH and everything but seems the node stays active, just in a disconnected state.

rdfedor commented 3 years ago

Here's some logs that are possibly related?

2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.317368109Z" level=error msg="failed adding service binding for 55f332a13546a734f6f69fad8a6f3b7d462a076e676a9839d55141ac81363cb5 epRec:{cluster-monitoring_grafana.1.si6tnkplvmbnk14jf7p4v4fdp cluster-monitoring_grafana ld6u2u6094euz0ppgzq33p829 172.20.5.2 172.20.5.3 [] [grafana] [493f5823c508] false} err:network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.313926739Z" level=error msg="failed adding service binding for 51cc84fc5efe2f2a7eba9ebab2638971c987dbcee7e194ab97b3826e1a1459ab epRec:{cluster-monitoring_prometheus.1.x8s1rjddtzpbca97mo6qxfcbo cluster-monitoring_prometheus iibue4vaw0r2qxcfajzpbxfgz 172.20.5.22 172.20.5.23 [] [prometheus] [516f7b187774] false} err:network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.312982424Z" level=error msg="failed adding service binding for 384f771627c9de529758ab8374a3b265a5013de01de10b5b6ef714b1c413a479 epRec:{cluster-monitoring_node-exporter.rrzqexbfbe3rd900u86az51i3.dr9fach7qdlu7j5fm7m5pvdiq cluster-monitoring_node-exporter dfrudo8og7rzvuzlih0f3wx2a 172.20.5.17 172.20.5.21 [] [node-exporter] [4c97770fd8b5] false} err:network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.311953776Z" level=error msg="failed adding service binding for 2f9a80189de15f120549463a607ff5f848ef0bad4c1a977f5cf040d9ccd98b7b epRec:{cluster-monitoring_promtail.rrzqexbfbe3rd900u86az51i3.u0742bms5u85p26wyd1bxuzv4 cluster-monitoring_promtail 8zwobb0zawd7ekinnn60u9d9d 172.20.5.7 172.20.5.10 [] [promtail] [77eefb6b207b] false} err:network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.308033794Z" level=error msg="failed to get network during CreateEndpoint: network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.306874943Z" level=error msg="failed to get network during CreateEndpoint: network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.306136906Z" level=error msg="failed to get network during CreateEndpoint: network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.305671702Z" level=error msg="network cluster-monitoring_monitoring remove failed: error while removing network: unknown network cluster-monitoring_monitoring id r0xt6yl6vfqw4fgs45a7ch7p0" module=node/agent node.id=mpuf1oxq1pwnanun9pwhjeo7t
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.305231609Z" level=error msg="failed to get network during CreateEndpoint: network r0xt6yl6vfqw4fgs45a7ch7p0 not found"
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.304561035Z" level=error msg="network cluster-monitoring_monitoring remove failed: error while removing network: unknown network cluster-monitoring_monitoring id r0xt6yl6vfqw4fgs45a7ch7p0" module=node/agent node.id=mpuf1oxq1pwnanun9pwhjeo7t
2020-12-22 21:47:03 | Dec 23 03:17:34 swarm-node-3 dockerd[526]: time="2020-12-23T03:17:34.292442757Z" level=error msg="network cluster-monitoring_monitoring remove failed: error while removing network: unknown network cluster-monitoring_monitoring id r0xt6yl6vfqw4fgs45a7ch7p0" module=node/agent node.id=mpuf1oxq1pwnanun9pwhjeo7t

A little about the setup, have a 5TB Raid 10 drive connected to the master node + a 500 GB SSD. Both drives are shared across nfs to the other 3 nodes where the 500 GB SSD is used for docker volume data. I have a few services that are hosted on outside of the swarm since it doesn't support using mounting devices or tunnels so I simply use a reverse proxy to route traffic through the swarm network.

Just to re-emphasize, I have it set up to stream syslogs, when this issue hits, the log stream halts because all the network traffic just dies. The containers still operate, just w/o network access. Then once I reboot the node, the network resets and the logs are flushed all at once. I have an hdmi to usb 3 capture stick on order that comes in tomorrow to I should be able use my tablet as a monitor to debug it.

docker / for-linux

enabling default vlan on bridge br0 failed open /sys/class/net/br0/bridge/default_pvid: permission denied #881

Expected behavior

Actual behavior