cloudfoundry / bpm-release

isolated bosh jobs
Apache License 2.0
34 stars 28 forks source link

Support shared mount-only volumes #127

Closed danail-branekov closed 5 years ago

danail-branekov commented 5 years ago

Hi BPM,

As a follow up from https://github.com/cloudfoundry/garden-runc-release/issues/133, it turned out that the new cool shared mount option does not work well together with the mount-only option, see this comment for details What do you think about being able to have a volume that is both mount-only and shared? Thus volume providers would be able to create their shared volumes with proper permissions.

Thanks!

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/167010228

The labels on this github issue will be updated when the story is started.

xoebus commented 5 years ago

Hmm...

I would have expected the volume services release to create the volume with shared: true. Garden would then just mount_only: true this volume into the container without needing shared: true.

Am I misunderstanding how shared mounts propagate through multiple bind mounts?

danail-branekov commented 5 years ago

Hmm... what you suggest makes sense. My expectations however were that the "provisioning" release just creates the directory and declares that the directory should be a shared mount in the job's container. Then, BPM takes care to bind mount it and make the mount shared, rather then the job release doing that. @julian-hj, I think your input could be useful here.

When discussing shared volumes we should also consider the garden-rep jobs scenario that both jobs (i.e. both releases) need a shared volume in order to start. As bind-mounting the volume must be done just once (in order bind mounts not to be overlapped), both releases' prestart scripts currently synchronise via file system locks to create the directory and make it a shared mount([1], [2]). It would be great if we could get rid of that ugliness. @xoebus, I think that the flow you suggest would not let us achieve that, wouldn't it?

[1] Garden creates the shared volumes here, note the flock [2] Rep does pretty much the same using the same flock here

xoebus commented 5 years ago

BPM currently grabs a machine-level lock when bind-mounting shared mounts to avoid that race.

Would it work if both jobs declared the volume to be shared?

danail-branekov commented 5 years ago

You mean shared and NOT mount_only, is that correct? That would mean that BPM would create the directory with permissions 700 and owned by vcap, please correct me if I am wrong. I don't think that these permissions would work for everyone, would definitely not work for rootless garden and rep scenario

xoebus commented 5 years ago

Hmmm, maybe we could allow configurable owner, group, and permissions? What do they need to be set to in this case?

danail-branekov commented 5 years ago

Well, for the rootless garden - rep case we want those directories to be readable by any user, so maybe we are looking into permissions 744. 766 is not fine because we use the shared volume to store cell certificates and they must be read-only.

OTH I can imagine that the volume service would want applications to be able to read/write in shared volumes, so the shared volumes should be permissioned as 766. I could also imagine volume service users wishing to be able to execute binaries from shared volumes, in that case volume permissions should be 755 or even 777. As you can see, there are lots of mutually exclusive combinations and therefore there seems not to be a reasonable permissions default.

Yes, BPM could introduce configs for ownership and permissions but would that not be too much of configuration? I am afraid that we would come to a future situation that we need yet another config. Also, wouldn't configuring ownership and permissions render the mount_only option useless? Isn't it simpler whoever provides the directories that are to be mounted as additional volumes in BPM containers to simply create those directories however they want (permissions, ownership, content, whatever) on the host in a bpm-prestart script and then just declare them as mount_only, optionally shared, volumes?

xoebus commented 5 years ago

Yep, sounds good. I'll work on it today/the start of next week.

xoebus commented 5 years ago

I've put a potential fix for this on the share-mount-only branch.

Do you have a minimal example which you or I could use to verify that this works for your use-case?

danail-branekov commented 5 years ago

I had a look at your commit, I think it should be fine. Unfortunately I do not have an out-of-the-box minimal setup to use to give the change a try :( Provided all the fiddling to setup at least two jobs that share volumes, it would be probably easier to spin up a BBL CF deployment and just follow the reproduction steps in https://github.com/cloudfoundry/garden-runc-release/issues/133

xoebus commented 5 years ago

I did some things manually to mimic a pre-start script creating a directory and BPM mounting it in as a shared mount.

$ sudo -i
# mkdir /var/vcap/data/mo-shared
# chmod 777 $!
# vim /var/vcap/jobs/test-server/config/bpm.yml
[ ... furious editing ... ]
# cat !$
processes:
- name: test-server
  executable: /var/vcap/packages/test-server/bin/test-server
  args:
    - --port
    - 1337
  env:
    BPM: SWEET
  limits:
    memory: 1G
    open_files: 1024
  ephemeral_disk: true
  persistent_disk: true
  additional_volumes:
  - path: /var/vcap/data/shared
  - path: /var/vcap/data/mo-shared
    mount_only: true
    shared: true
  unsafe:
    unrestricted_volumes:
    - path: /dev/log

- name: alt-test-server
  executable: /var/vcap/packages/test-server/bin/test-server
  args:
    - --port
    - 1338
    - --ignore-signals
  env:
    BPM: CONTAINED
# monit restart test-server
# mkdir /var/vcap/data/mo-shared/mount
# mount -t tmpfs -o size=512m tmpfs /var/vcap/data/mo-shared/mount
# touch /var/vcap/data/mo-shared/mount/file
# bpm shell test-server
test-server $ ls -R /var/vcap/data/mo-shared/
/var/vcap/data/mo-shared/:
mount

/var/vcap/data/mo-shared/mount:
file