cloudfoundry / bpm-release

isolated bosh jobs
Apache License 2.0
34 stars 28 forks source link

BPM fails attempting to allow access to /dev/console via runc in latest bosh-lite #143

Closed kieron-dev closed 4 years ago

kieron-dev commented 4 years ago

The latest bosh-deployment picked up a bump to garden-runc-release, which in turn picked up a bump to runc which removed access to /dev/console for security reasons.

When BPM is running in a bosh-lite environment based on this garden-runc version, it invokes its own packaged runc which attempts to set default access permissions on /dev/console which has been masked by the parent container runc. This results in an error such as this example from the nats release:

container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \
"process_linux.go:415: setting cgroup config for procHooks process caused \\\"failed to write \\\\\\\"c
5:1 rwm\\\\\\\" to \\\\\\\"/sys/fs/cgroup/devices/system.slice/garden.service/f89f3043-f0e3-4ec3-60c2-49
a94e5ba465/bpm-nats/devices.allow\\\\\\\": write /sys/fs/cgroup/devices/system.slice/garden.service/f89f
3043-f0e3-4ec3-60c2-49a94e5ba465/bpm-nats/devices.allow: operation not permitted\\\"\""

To reproduce this, take latest versions of bosh-deployment and cf-deployment, create a bosh-lite director then attempt to deploy CF. You will see the nats deployment fail, and the above message will be in /var/vcap/sys/log/nats/nats.stderr.log.

We have verified that by bumping bpm-release's packaged runc to commit 3f1e88699199d844dd211cbf5108eeff9444674e fixes this problem. This is the commit currently used by garden-runc-release. This is not an official release, but garden bumped to this commit because the tagged release is too old and causes garden errors with go 1.14 amongst other problems.

cc: @danail-branekov

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/173511670

The labels on this github issue will be updated when the story is started.

aashah commented 4 years ago

Hey @kieron-pivotal,

Just wanted to ack the issue. We got some feedback in slack about this as well, so it's on our radar.

There might be some delay since the BOSH team is taking over responsibility of BPM, so we need to move some of their CI/assets over. But we're looking into this!

aashah commented 4 years ago

runc recently put out a patch to bring in the fix you shared, https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc91.

I don't have a full understanding of the issue, if this has to do with compatibility between garden-runc & runc inside bpm-release, I think we'd both need to release a new version bumping up to at least the commit you shared. Is that correct?

kieron-dev commented 4 years ago

Yes - garden is currently on a commit between rc90 and rc91 containing that /dev/console change already. BPM bumping to rc91 will fix this issue by taking the /dev/console change too, and garden will bump to the tagged release now that it's available.

kieron-dev commented 4 years ago

Garden has released v1.19.14 with runc-v1.0.0-rc91

rkoster commented 4 years ago

Should be fixed in: https://github.com/cloudfoundry/bpm-release/pull/145

rkoster commented 4 years ago

validated cf smoke tests pass on bosh-lite with: