Running in `-fuse` mode doesn't work between containers/host

jonpjenkins commented 4 years ago

Bug Description

This could be a documentation issue, as I am unable to reference a definitive guide on using the -fuse flag for the kubernetes client. I am coming up against various errors (similar to Issue:38), wherein the image is stating the "fusermount" is not executable.

At this point, I am not sure if I am specifying the wrong flags, or if there is something greater going on. Any advise would be appreciated.

Example code

In this case, I am deploying a cloudsql proxy as a container, mounting /cloudsql/fuse as the directory in which I'd like the begin the socket path. This error also arises when using the "default" /cloudsql, I am limited to using this alternative directory due to the parent chart.

Upon deploying the below, I am seeing issues of the type:

Mounting /cloudsql/fuse...                                                                                                                            
Could not start fuse directory at "/cloudsql/fuse": cannot mount "/cloudsql/fuse": fusermount: exec: "fusermount": executable file not found in $PATH

Container yaml:

        - command:
          - /cloud_sql_proxy
          - -dir=/cloudsql/fuse
          - -fuse
          - -credential_file=/secrets/cloudsql/credentials.json
          image: gcr.io/cloudsql-docker/gce-proxy:1.17
          imagePullPolicy: IfNotPresent
          lifecycle:
            preStop:
              exec:
                command:
                - /bin/sh
                - -c
                - sleep 15
          name: cloudsql-proxy
          resources:
            limits:
              cpu: 125m
              memory: 256Mi
            requests:
              cpu: 125m
              memory: 256Mi
          securityContext:
            runAsGroup: 65532
            runAsNonRoot: true
            runAsUser: 65532
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /cloudsql/fuse
            name: userconfig-fuse
          - mountPath: /secrets/cloudsql
            name: userconfig-cloudsql-instance-credentials
            readOnly: true

How to reproduce

Add the above container to a deployment
Note the error of the container

Environment

Cloud SQL Proxy version (./cloud_sql_proxy -version): 1.17

kurtisvg commented 4 years ago

Hey @jonthegimp,

Thanks for opening an issue. I believe I was the one that recommended running fuse mode on Kubernetes to you, but there seem to be a few gotcha's with setting it up that I wasn't aware of.

First, you may need to make sure that the image has FUSE installed. For that, you may need to switch to using the buster based image, and also possibly run sudo apt install fuse inside it. I'm not sure if debian has the fuse package installed by default, but if this is required we can probably pretty easily add it into our container image for future releases.

The second gotcha seems to be that you need to have specific privileges for the container. Specifically, it looks like you need to have SYS_ADMIN capabilities and access to --/dev/fuse device. You can do that by adding the following in SecurityContext:

securityContext:
  ...
  privileged: true
  capabilities:
     add:
      - SYS_ADMIN

You may be able to get away with out the privileged: true (I would try that first and see if it works), but it might be necessary for access to /dev/fuse.

Please try that out and let us know if you run into more issues.

jonpjenkins commented 4 years ago

@kurtisvg ,

Thanks for the message. I've had some time to work with this, and I believe I'm close but am seeing issues with the container keeping the mount point of the volume.

securityContext:
  privileged: true
  runAsGroup: 65532
  runAsUser: 65532

The above is the only special privileges I've set for the container. I tried a couple options as well - with

capabilities:
  add:
    - SYS_ADMIN

and

runAsNonRoot: true

But they all yield the same result with regards to the mount issue below.

I built a new image using the source Dockerfile, adding fuse to the install, and a sed command:

FROM debian:buster
RUN apt-get update && apt-get install -y fuse ca-certificates
RUN sed -i 's/#user/user/g' /etc/fuse.conf
# Add a non-root user matching the nonroot user from the main container
RUN groupadd -g 65532 -r nonroot && useradd -u 65532 -g 65532 -r nonroot
# set the uid as an integer for compatibility with runAsNonRoot in Kubernetes
USER 65532

COPY --from=build --chown=nonroot /go/src/cloudsql-proxy/cloud_sql_proxy /cloud_sql_proxy

This does allow the proxy to come up:

current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.                             
using credential file for authentication; email=<redact> 
Mounting /cloudsql/fuse...                                                                               
Mounted /cloudsql/fuse                                                                                   
Ready for new connections

When I shell into the container, I am able to connect to my cloudsql instances using a mariahdb client, so that is good.

Missing mount directory

The issue is that the volume specified (/cloudsql/fuse) is not mounted on the cloudsql proxy container:

nonroot@app-2:/$ df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          95G  5.3G   89G   6% /
tmpfs            64M     0   64M   0% /dev
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup
tmpfs           7.9G  4.0K  7.9G   1% /secrets/cloudsql
/dev/sda1        95G  5.3G   89G   6% /etc/hosts
shm              64M     0   64M   0% /dev/shm
tmpfs           7.9G   12K  7.9G   1% /run/secrets/kubernetes.io/serviceaccount
_apt@app-2:/$

I am wondering if the fuse process is clobbering that mount. A describe on that pod states the container is mounting the volume:

Mounts:                                                                           
      /cloudsql/fuse from userconfig-fuse (rw)                                        
      /secrets/cloudsql from userconfig-cloudsql-instance-credentials (ro)            
      /var/run/secrets/kubernetes.io/serviceaccount from vault-app-token-vr7m8 (ro)

And a log of the cloudsq-proxy states about the same:

 2020/09/04 00:48:41 current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.                             
 2020/09/04 00:48:41 using credential file for authentication; email=<redact> 
 2020/09/04 00:48:41 Mounting /cloudsql/fuse...                                                                               
 2020/09/04 00:48:41 Mounted /cloudsql/fuse                                                                                   
 2020/09/04 00:48:41 Ready for new connections

kurtisvg commented 4 years ago

One of the requirements for FUSE is you need to have access to the/dev/fuse. I'm a little fuzzy on how this device is used under the hood, but it's possible that it needs to be shared with one (or both) of the containers accessing the volume. There's an example here showing how to mount a different character device - I'll see if I can get a quick example working using that.

Carrotman42 commented 4 years ago

I have not tried to mount FUSE between docker containers (or container<->host) in >3 years, but I do remember that some things didn't really work the last time I tried; I think it had to do with some filesystem namespace that (at the time) Docker couldn't share between the container and the host. Things definitely could have changed, but I'd verify that things are supposed to work with the current version of Docker before digging too deep.

An earlier comment from OP mentioned:

When I shell into the container, I am able to connect to my cloudsql instances using a mariahdb client, so that is good.

This is my recollection of what I was able to get to work before: FUSE worked from within the container, but not from outside the container.

kurtisvg commented 4 years ago

Following @Carrotman42's advice, I took a step back and attempted to get this to run locally before trying in k8s - unfortunately I seem to be hitting the same limitation:

Here's my current command

docker run -it --rm --name proxy --user=root \
  -v <PATH_TO_MY_KEY>:/config \
  --mount type=bind,source=/cloudsql,target=/cloudsql,bind-propagation=rshared  \
  --device=/dev/fuse \
  --privileged \
  <MY_IMAGE_NAME> \
  /cloud_sql_proxy -fuse -dir /cloudsql -credential_file=/config/key.json

From the guest, I'm able to both see the README and connect using the unix socket. From the host, I'm unable to see either.

There does seem to be some evidence that this should work (1 2 3), but unfortunately I'm over my head in on why it's not. I've tweaked a few options but are so far unable to connect.

jonpjenkins commented 4 years ago

@kurtisvg,

Poking around in the code, running a docker command like you have above, I am wondering about the following lines:

from proxy/fuse/fuse.go:73

    if err := fuse.Unmount(mountdir); err != nil {
        // The error is too verbose to be useful to print out
    }

I am noticing that this will always unmount that mounted directory, even it is not mounted by fuse. If I remove those lines, then I can see the README from the host, although I cannot access the socket at /cloudsql/<project>:<instance_name>. I can still connect via the guest.

In addition, if the container shuts down uncleanly, I need to manually run a sudo fusermount -u /cloudsql before re-running the docker container.

My golang abilities are pretty rudimentary, but is there some way to have the channel (proxy.Conn) clean up that mount drive when it is shut down?

Carrotman42 commented 4 years ago

Interesting, thanks for looking closer!

Your observation about needing to manually run fusermount -u is precisely the reason why the Proxy is calling fuse.Unmount before trying to mount. There is no way to definitely call some function when a process exits in general (for example the process could have been forced to exit by the oomkiller, or anything else could send a SIGKILL) so we can't rely on an in-process cleanup.

Can we tell whether a directory is mounted via FUSE vs via docker? If so, the Proxy could check to see if the mountpoint is based on FUSE and only unmount in that case.

although I cannot access the socket at /cloudsql/:

To be clear: even though you can see the README, connecting to the database doesn't work? What does ls /cloudsql/$NAME show?

Seems like it's at least a net positive that the readme is working if we don't unmount the docker directory.

jonpjenkins commented 4 years ago

@Carrotman42,

To be clear: even though you can see the README, connecting to the database doesn't work? What does ls /cloudsql/$NAME show?

Here is what I see, from the host:

❯  ls -alh /cloudsql
total 0
-r--r--r-- 0 root root 404 Aug 30  1754 README

~ 
❯  ls -alh /cloudsql/db-01
lrwxrwxrwx 0 root root 0 Aug 30  1754 /cloudsql/db-01 -> /tmp/cloudsql-proxy-tmp/db-01

❯  mysql -S /cloudsql/<project>:entity-01
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/cloudsql/<project>:db-01' (2 "No such file or directory")

Looks like it cannot find the file in the temp directory - so I mounted that location as well:

docker run -it --rm --name proxy --user=root \
  --mount type=bind,source=<keyfile>,target=/config/credentials.json \
  --mount type=bind,source=/cloudsql,target=/cloudsql,bind-propagation=rshared  \
  --mount type=bind,source=/tmp/cloudsql,target=/tmp/cloudsql,bind-propagation=rshared  \
  --device=/dev/fuse \
  --privileged \
   <image built with the fuse.Unmount commented out> \
  /cloud_sql_proxy -fuse -fuse_tmp /tmp/cloudsql -dir /cloudsql -credential_file=/config/credentials.json

I am able to see the entry in the temp dir:

❯  ls -alh /cloudsql/entity-01
lrwxrwxrwx 0 root root 0 Aug 30  1754 /cloudsql/db-01 -> /tmp/cloudsql/db-01

❯  ls -alh /tmp/cloudsql/
total 1.1M
drwxrwxr-x  2 user usergroup 4.0K Sep  9 10:05 .
drwxrwxrwt 19 root     root     1.1M Sep  9 10:06 ..
srwxrwxrwx  1 root     root        0 Sep  9 10:05 db-01
srwxrwxrwx  1 root     root        0 Sep  9 10:05 .Trash
srwxrwxrwx  1 root     root        0 Sep  9 10:05 .Trash-1001

And I am able to connect!

❯  mysql -S /cloudsql/<project>:db-01 -uroot -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 2677715
Server version: 5.7.14-google-log (Google)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> exit
Bye

So, having the tmp directory shared between the containers will be necessary as well it would seem.

kurtisvg commented 4 years ago

@jonthegimp - Thanks for putting in the legwork here. I've switched this to a bug and will look into a more permanent fix, as well as add an example for this use case in examples/k8s.

@Carrotman42 - just to confirm, the intent of the first fuse.Unmount is only incase the proxy failed to cleanup (I do see another unmount in the close) ? Are we sure it's worthwhile to attempt the first unmount at all?

Carrotman42 commented 4 years ago

@Carrotman42 - just to confirm, the intent of the first fuse.Unmount is only incase the proxy failed to cleanup (I do see another unmount in the close) ?

Correct, that is the intent.

Are we sure it's worthwhile to attempt the first unmount at all?

As I mentioned, on Linux one cannot rely on in-process cleanup happening 100% of the time. You can try your hardest, but SIGKILL may be sent for a number of reasons. Plus, long-term, we shouldn't assume that Go will never have a bug (I actually helped uncover a bug in Go using the Proxy that would have caused an issue here, so this is not theoretical); we need to maintain the property that the Proxy can exit (or be killed) at any moment and still restart cleanly in order to make this client safe for production.

If we don't handle this edge-case somehow, it will increase the chance that someone will become broken and manually have to recover. We want to avoid any need for manual recovery efforts.

I think we should be able to tell whether Docker or the Proxy is the reason some directory is mounted (Docker doesn't mount things using FUSE as far as I know), and only unmount if we see the directory was FUSE-mounted before. I don't think there's a concern about unmounting the wrong FUSE directory, since you can't double-mount a directory for FUSE anyway.

kurtisvg commented 4 years ago

I don't think there's a concern about unmounting the wrong FUSE directory, since you can't double-mount a directory for FUSE anyway.

I think this is my concern - that the proxy might unmount an existing FUSE directory that is still in use that or created by a different process (either another instance of the proxy or not). While obviously a configuration error, but might be non obvious behavior. However, I agree that it's more important that the proxy make the best attempt to start up successfully in this scenario.

I think my goal here will be to restrict the unmount behavior to only when needed (preferably if a FUSE volume is already mounted, or possibly if a first attempt at unmount fails) and clarify in the flag description (and logs) that an unmount may be performed in the attempted directory.

Carrotman42 commented 4 years ago

Trying to mount FUSE once, and on error unmounting and retrying the mount seems sufficient to me as well!

Carrotman42 commented 4 years ago

Just found out about some extra FUSE option called auto_unmount which seems to, well, automatically unmount "if the filesystem terminates for any reason" (reference: https://man7.org/linux/man-pages/man8/fuse.8.html). I'm thinking that in this case, the Proxy is "the filesystem", so this seems like it would solve the problem I'm talking about without having to try to unmount on startup.

kurtisvg commented 4 years ago

Looks promising. I'll look more into that before the remounting strategy previously discussed.

kurtisvg commented 4 years ago

Looking at the Mount Options for our current fuse library, it doesn't look like auto_unmount is supported. I'm a little uncertain of the relationship between libfuse and this library, but my initial glance is that they are parallel without any reliance on the other. This flag seems to be is specific to libfuse, so I'm going to go back to attempting to unmount if an error occurs.

kurtisvg commented 4 years ago

@jonthegimp I seem to be having trouble replicating your success. Are you still using this mode, and is it still successful for you?

I'm using the latest version of the master branch, but with L73-75 commented out from fuse.go. I'm building the docker container with the following:

 docker build -f Dockerfile.buster --tag proxy-buster-dev .

And then running with the following command:

docker run -it --rm --name proxy --user=root \
  --mount type=bind,source=<MY_KEYFILE>,target=/config/credentials.json \
  --mount type=bind,source=/cloudsql,target=/cloudsql \
  --mount type=bind,source=/tmp/cloudsql,target=/tmp/cloudsql \
  --device /dev/fuse \
  --privileged \
  proxy-buster-dev \
  /cloud_sql_proxy -fuse -fuse_tmp /tmp/cloudsql -dir /cloudsql -credential_file=/config/credentials.json

Am I missing anything here? Running on linux

jonpjenkins commented 4 years ago

@kurtisvg I am just getting back to this; apologies for the delay. Upon a revisit I am unable to replicate my success as well, so there is a step I am missing. I'll go back to the drawing board and document all my steps this time.

jonpjenkins commented 4 years ago

@kurtisvg I had one more change - here is the full diff of what I found to work:

diff --git a/proxy/fuse/fuse.go b/proxy/fuse/fuse.go
index ee96afe..0973168 100644
--- a/proxy/fuse/fuse.go
+++ b/proxy/fuse/fuse.go
@@ -65,15 +65,15 @@ func Supported() bool {
 //
 // The connset parameter is optional.
 func NewConnSrc(mountdir, tmpdir string, connset *proxy.ConnSet) (<-chan proxy.Conn, io.Closer, error) {
+
        if err := os.MkdirAll(tmpdir, 0777); err != nil {
                return nil, nil, err
        }

-       if err := fuse.Unmount(mountdir); err != nil {
-               // The error is too verbose to be useful to print out
-       }
+       logging.Verbosef("Not using fuse.Unmount for directory: %v...", mountdir)
+
        logging.Verbosef("Mounting %v...", mountdir)
-       c, err := fuse.Mount(mountdir, fuse.AllowOther())
+       c, err := fuse.Mount(mountdir, fuse.AllowOther(), fuse.AllowNonEmptyMount(), fuse.DefaultPermissions())
        if err != nil {
                return nil, nil, fmt.Errorf("cannot mount %q: %v", mountdir, err)
        }

From the host machine, I used the following to "prepare" the local dirs:

umount /cloudsql 
rm -rf /tmp/cloudsql /cloudsql 
mkdir -p /tmp/cloudsql /cloudsql 
chmod -R 777 /tmp/cloudsql 
chmod -R 777 /cloudsql

Build the image:

docker build -f Dockerfile.buster --tag proxy-buster-local .

and run it:

docker run -it --rm --name proxy --user=root \
--mount type=bind,source=/tmp/credentials.json,target=/config/credentials.json  \
--mount type=bind,source=/cloudsql,target=/cloudsql,bind-propagation=rshared \
--mount type=bind,source=/tmp/cloudsql,target=/tmp/cloudsql,bind-propagation=rshared \
--device=/dev/fuse \
--privileged proxy-buster-local \
/cloud_sql_proxy -fuse -fuse_tmp /tmp/cloudsql -dir /cloudsql -credential_file=/config/credentials.json

From the host machine I was able to connect:

mysql -S /cloudsql/<project>:<instance> -p$PASS -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 172889
Server version: 5.7.25-google-log (Google)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>

kurtisvg commented 4 years ago

Hey @jonthegimp, thanks for the additional information.

It looks like the only piece I was missing was including bind-propagation=rshared for the volume mounts. I checked the other two options you specified (but that aren't needed to mount) as well:

default_permissions - By default FUSE doesn't check file access permissions, the filesystem is free to implement it's access policy or leave it to the underlying file access mechanism (e.g. in case of network filesystems). This option enables permission checking, restricting access based on file mode. This is option is usually useful together with the allow_other mount option.
nonempty - Allows mounts over a non-empty file or directory. By default these mounts are rejected to prevent accidental covering up of data, which could for example prevent automatic backup.

I'm not sure either are particularly useful - the first might be, but it seems like the folder permissions already apply, and the per-file access can only be enabled after the link has been created (which seems to defeat the point of fuse). The second seems potentially dangerous, as it might wipe out a directory.

I opened #537 here to fix this issue. I'm still seeing that docker won't allow the proxy to clean up the volume through the bind for some reason, but don't know what the cause is. However this seems to be limited when running in the container and the proxy does cleanly start back up again with these changes, so I don't think it's a blocker.

kurtisvg commented 4 years ago

@jonthegimp If you have time, I would appreciate if you could test #537 fixes the problem in your environment. If so we should be able to get it into the next release.

jonpjenkins commented 4 years ago

@kurtisvg , I had a little time to try this out - and found the following:

The containers of my deployment look like the following:

      containers:
      - image: percona:5.7
        name: percona
        env:
        - name: MYSQL_ALLOW_EMPTY_PASSWORD
          value: "true"
        volumeMounts:
        - mountPath: /cloudsql
          name: userconfig-fuse
          mountPropagation: Bidirectional
        - mountPath: /tmp/cloudsql
          name: userconfig-fuse-tmp
        securityContext:
          privileged: true
      - command:
        - /cloud_sql_proxy
        - -fuse
        - -fuse_tmp=/tmp/cloudsql
        - -dir=/cloudsql
        - -credential_file=/secrets/cloudsql/credentials.json
        image:<localrepo>/proxy-buster-537:latest
        imagePullPolicy: Always
        name: cloudsql-proxy
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /cloudsql
          name: userconfig-fuse
          mountPropagation: Bidirectional
        - mountPath: /tmp/cloudsql
          name: userconfig-fuse-tmp          
        - mountPath: /secrets/cloudsql
          name: cloudsql-instance-credentials
          readOnly: true
        securityContext:
          privileged: true
          runAsGroup: 65532
          runAsUser: 65532

As built, I encountered the following error:

Mounting /cloudsql...                                                                                             
mount helper error: fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf    
WARNING: Mount failed - attempting to unmount dir to resolve...%!(EXTRA string=/cloudsql)                         
Unmount failed: exit status 1: fusermount: entry for /cloudsql not found in /etc/mtab                             
mount helper error: fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf    
Could not start fuse directory at "/cloudsql": cannot mount "/cloudsql": fusermount: exit status 1

I had to add the following to the Docker.buster file:

RUN apt-get update && apt-get install -y \
    ca-certificates \
    fuse
# Add the sed statement to uncomment the user_allow_other option
RUN sed -i 's/#user/user/g' /etc/fuse.conf

With the above, I was able to access the database from the percona image as expected

kurtisvg commented 4 years ago

It looks like we can do this by adding the fuse group to the nonroot users as well - I opened #540 to do and will see if I can test this afternoon or tomorrow.

kurtisvg commented 3 years ago

Ok, adding the fuse group didn't work (seems to be some outdated documentation regarding its existence is the later versions of Debian).

I followed @jonthegimp's lead and used sed to replace the value in the config. I confirmed this allows fuse to work for both the buster and the alpine images.

GoogleCloudPlatform / cloud-sql-proxy