Closed jonpjenkins closed 3 years ago
Hey @jonthegimp,
Thanks for opening an issue. I believe I was the one that recommended running fuse mode on Kubernetes to you, but there seem to be a few gotcha's with setting it up that I wasn't aware of.
First, you may need to make sure that the image has FUSE installed. For that, you may need to switch to using the buster
based image, and also possibly run sudo apt install fuse
inside it. I'm not sure if debian has the fuse package installed by default, but if this is required we can probably pretty easily add it into our container image for future releases.
The second gotcha seems to be that you need to have specific privileges for the container. Specifically, it looks like you need to have SYS_ADMIN
capabilities and access to --/dev/fuse device
. You can do that by adding the following in SecurityContext
:
securityContext:
...
privileged: true
capabilities:
add:
- SYS_ADMIN
You may be able to get away with out the privileged: true
(I would try that first and see if it works), but it might be necessary for access to /dev/fuse
.
Please try that out and let us know if you run into more issues.
@kurtisvg ,
Thanks for the message. I've had some time to work with this, and I believe I'm close but am seeing issues with the container keeping the mount point of the volume.
securityContext:
privileged: true
runAsGroup: 65532
runAsUser: 65532
The above is the only special privileges I've set for the container. I tried a couple options as well - with
capabilities:
add:
- SYS_ADMIN
and
runAsNonRoot: true
But they all yield the same result with regards to the mount issue below.
I built a new image using the source Dockerfile, adding fuse to the install, and a sed command:
FROM debian:buster
RUN apt-get update && apt-get install -y fuse ca-certificates
RUN sed -i 's/#user/user/g' /etc/fuse.conf
# Add a non-root user matching the nonroot user from the main container
RUN groupadd -g 65532 -r nonroot && useradd -u 65532 -g 65532 -r nonroot
# set the uid as an integer for compatibility with runAsNonRoot in Kubernetes
USER 65532
COPY --from=build --chown=nonroot /go/src/cloudsql-proxy/cloud_sql_proxy /cloud_sql_proxy
This does allow the proxy to come up:
current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.
using credential file for authentication; email=<redact>
Mounting /cloudsql/fuse...
Mounted /cloudsql/fuse
Ready for new connections
When I shell into the container, I am able to connect to my cloudsql instances using a mariahdb client, so that is good.
The issue is that the volume specified (/cloudsql/fuse) is not mounted on the cloudsql proxy container:
nonroot@app-2:/$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 95G 5.3G 89G 6% /
tmpfs 64M 0 64M 0% /dev
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
tmpfs 7.9G 4.0K 7.9G 1% /secrets/cloudsql
/dev/sda1 95G 5.3G 89G 6% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 7.9G 12K 7.9G 1% /run/secrets/kubernetes.io/serviceaccount
_apt@app-2:/$
I am wondering if the fuse process is clobbering that mount. A describe on that pod states the container is mounting the volume:
Mounts:
/cloudsql/fuse from userconfig-fuse (rw)
/secrets/cloudsql from userconfig-cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from vault-app-token-vr7m8 (ro)
And a log of the cloudsq-proxy states about the same:
2020/09/04 00:48:41 current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.
2020/09/04 00:48:41 using credential file for authentication; email=<redact>
2020/09/04 00:48:41 Mounting /cloudsql/fuse...
2020/09/04 00:48:41 Mounted /cloudsql/fuse
2020/09/04 00:48:41 Ready for new connections
One of the requirements for FUSE is you need to have access to the/dev/fuse
. I'm a little fuzzy on how this device is used under the hood, but it's possible that it needs to be shared with one (or both) of the containers accessing the volume. There's an example here showing how to mount a different character device - I'll see if I can get a quick example working using that.
I have not tried to mount FUSE between docker containers (or container<->host) in >3 years, but I do remember that some things didn't really work the last time I tried; I think it had to do with some filesystem namespace that (at the time) Docker couldn't share between the container and the host. Things definitely could have changed, but I'd verify that things are supposed to work with the current version of Docker before digging too deep.
An earlier comment from OP mentioned:
When I shell into the container, I am able to connect to my cloudsql instances using a mariahdb client, so that is good.
This is my recollection of what I was able to get to work before: FUSE worked from within the container, but not from outside the container.
Following @Carrotman42's advice, I took a step back and attempted to get this to run locally before trying in k8s - unfortunately I seem to be hitting the same limitation:
Here's my current command
docker run -it --rm --name proxy --user=root \
-v <PATH_TO_MY_KEY>:/config \
--mount type=bind,source=/cloudsql,target=/cloudsql,bind-propagation=rshared \
--device=/dev/fuse \
--privileged \
<MY_IMAGE_NAME> \
/cloud_sql_proxy -fuse -dir /cloudsql -credential_file=/config/key.json
From the guest, I'm able to both see the README and connect using the unix socket. From the host, I'm unable to see either.
There does seem to be some evidence that this should work (1 2 3), but unfortunately I'm over my head in on why it's not. I've tweaked a few options but are so far unable to connect.
@kurtisvg,
Poking around in the code, running a docker command like you have above, I am wondering about the following lines:
if err := fuse.Unmount(mountdir); err != nil {
// The error is too verbose to be useful to print out
}
I am noticing that this will always unmount that mounted directory, even it is not mounted by fuse. If I remove those lines, then I can see the README from the host, although I cannot access the socket at /cloudsql/<project>:<instance_name>
. I can still connect via the guest.
In addition, if the container shuts down uncleanly, I need to manually run a sudo fusermount -u /cloudsql
before re-running the docker container.
My golang abilities are pretty rudimentary, but is there some way to have the channel (proxy.Conn) clean up that mount drive when it is shut down?
Interesting, thanks for looking closer!
Your observation about needing to manually run fusermount -u
is precisely the reason why the Proxy is calling fuse.Unmount
before trying to mount. There is no way to definitely call some function when a process exits in general (for example the process could have been forced to exit by the oomkiller, or anything else could send a SIGKILL) so we can't rely on an in-process cleanup.
Can we tell whether a directory is mounted via FUSE vs via docker? If so, the Proxy could check to see if the mountpoint is based on FUSE and only unmount in that case.
although I cannot access the socket at /cloudsql/
:
To be clear: even though you can see the README, connecting to the database doesn't work? What does ls /cloudsql/$NAME
show?
Seems like it's at least a net positive that the readme is working if we don't unmount the docker directory.
@Carrotman42,
To be clear: even though you can see the README, connecting to the database doesn't work? What does ls /cloudsql/$NAME show?
Here is what I see, from the host:
❯ ls -alh /cloudsql
total 0
-r--r--r-- 0 root root 404 Aug 30 1754 README
~
❯ ls -alh /cloudsql/db-01
lrwxrwxrwx 0 root root 0 Aug 30 1754 /cloudsql/db-01 -> /tmp/cloudsql-proxy-tmp/db-01
❯ mysql -S /cloudsql/<project>:entity-01
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/cloudsql/<project>:db-01' (2 "No such file or directory")
Looks like it cannot find the file in the temp directory - so I mounted that location as well:
docker run -it --rm --name proxy --user=root \
--mount type=bind,source=<keyfile>,target=/config/credentials.json \
--mount type=bind,source=/cloudsql,target=/cloudsql,bind-propagation=rshared \
--mount type=bind,source=/tmp/cloudsql,target=/tmp/cloudsql,bind-propagation=rshared \
--device=/dev/fuse \
--privileged \
<image built with the fuse.Unmount commented out> \
/cloud_sql_proxy -fuse -fuse_tmp /tmp/cloudsql -dir /cloudsql -credential_file=/config/credentials.json
I am able to see the entry in the temp dir:
❯ ls -alh /cloudsql/entity-01
lrwxrwxrwx 0 root root 0 Aug 30 1754 /cloudsql/db-01 -> /tmp/cloudsql/db-01
❯ ls -alh /tmp/cloudsql/
total 1.1M
drwxrwxr-x 2 user usergroup 4.0K Sep 9 10:05 .
drwxrwxrwt 19 root root 1.1M Sep 9 10:06 ..
srwxrwxrwx 1 root root 0 Sep 9 10:05 db-01
srwxrwxrwx 1 root root 0 Sep 9 10:05 .Trash
srwxrwxrwx 1 root root 0 Sep 9 10:05 .Trash-1001
And I am able to connect!
❯ mysql -S /cloudsql/<project>:db-01 -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 2677715
Server version: 5.7.14-google-log (Google)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> exit
Bye
So, having the tmp directory shared between the containers will be necessary as well it would seem.
@jonthegimp - Thanks for putting in the legwork here. I've switched this to a bug and will look into a more permanent fix, as well as add an example for this use case in examples/k8s.
@Carrotman42 - just to confirm, the intent of the first fuse.Unmount
is only incase the proxy failed to cleanup (I do see another unmount in the close) ? Are we sure it's worthwhile to attempt the first unmount at all?
@Carrotman42 - just to confirm, the intent of the first fuse.Unmount is only incase the proxy failed to cleanup (I do see another unmount in the close) ?
Correct, that is the intent.
Are we sure it's worthwhile to attempt the first unmount at all?
As I mentioned, on Linux one cannot rely on in-process cleanup happening 100% of the time. You can try your hardest, but SIGKILL may be sent for a number of reasons. Plus, long-term, we shouldn't assume that Go will never have a bug (I actually helped uncover a bug in Go using the Proxy that would have caused an issue here, so this is not theoretical); we need to maintain the property that the Proxy can exit (or be killed) at any moment and still restart cleanly in order to make this client safe for production.
If we don't handle this edge-case somehow, it will increase the chance that someone will become broken and manually have to recover. We want to avoid any need for manual recovery efforts.
I think we should be able to tell whether Docker or the Proxy is the reason some directory is mounted (Docker doesn't mount things using FUSE as far as I know), and only unmount if we see the directory was FUSE-mounted before. I don't think there's a concern about unmounting the wrong FUSE directory, since you can't double-mount a directory for FUSE anyway.
I don't think there's a concern about unmounting the wrong FUSE directory, since you can't double-mount a directory for FUSE anyway.
I think this is my concern - that the proxy might unmount an existing FUSE directory that is still in use that or created by a different process (either another instance of the proxy or not). While obviously a configuration error, but might be non obvious behavior. However, I agree that it's more important that the proxy make the best attempt to start up successfully in this scenario.
I think my goal here will be to restrict the unmount behavior to only when needed (preferably if a FUSE volume is already mounted, or possibly if a first attempt at unmount fails) and clarify in the flag description (and logs) that an unmount may be performed in the attempted directory.
Trying to mount FUSE once, and on error unmounting and retrying the mount seems sufficient to me as well!
Just found out about some extra FUSE option called auto_unmount
which seems to, well, automatically unmount "if the filesystem terminates for any reason" (reference: https://man7.org/linux/man-pages/man8/fuse.8.html). I'm thinking that in this case, the Proxy is "the filesystem", so this seems like it would solve the problem I'm talking about without having to try to unmount on startup.
Looks promising. I'll look more into that before the remounting strategy previously discussed.
Looking at the Mount Options for our current fuse library, it doesn't look like auto_unmount
is supported. I'm a little uncertain of the relationship between libfuse
and this library, but my initial glance is that they are parallel without any reliance on the other. This flag seems to be is specific to libfuse, so I'm going to go back to attempting to unmount if an error occurs.
@jonthegimp I seem to be having trouble replicating your success. Are you still using this mode, and is it still successful for you?
I'm using the latest version of the master branch, but with L73-75 commented out from fuse.go
. I'm building the docker container with the following:
docker build -f Dockerfile.buster --tag proxy-buster-dev .
And then running with the following command:
docker run -it --rm --name proxy --user=root \
--mount type=bind,source=<MY_KEYFILE>,target=/config/credentials.json \
--mount type=bind,source=/cloudsql,target=/cloudsql \
--mount type=bind,source=/tmp/cloudsql,target=/tmp/cloudsql \
--device /dev/fuse \
--privileged \
proxy-buster-dev \
/cloud_sql_proxy -fuse -fuse_tmp /tmp/cloudsql -dir /cloudsql -credential_file=/config/credentials.json
Am I missing anything here? Running on linux
@kurtisvg I am just getting back to this; apologies for the delay. Upon a revisit I am unable to replicate my success as well, so there is a step I am missing. I'll go back to the drawing board and document all my steps this time.
@kurtisvg I had one more change - here is the full diff of what I found to work:
diff --git a/proxy/fuse/fuse.go b/proxy/fuse/fuse.go
index ee96afe..0973168 100644
--- a/proxy/fuse/fuse.go
+++ b/proxy/fuse/fuse.go
@@ -65,15 +65,15 @@ func Supported() bool {
//
// The connset parameter is optional.
func NewConnSrc(mountdir, tmpdir string, connset *proxy.ConnSet) (<-chan proxy.Conn, io.Closer, error) {
+
if err := os.MkdirAll(tmpdir, 0777); err != nil {
return nil, nil, err
}
- if err := fuse.Unmount(mountdir); err != nil {
- // The error is too verbose to be useful to print out
- }
+ logging.Verbosef("Not using fuse.Unmount for directory: %v...", mountdir)
+
logging.Verbosef("Mounting %v...", mountdir)
- c, err := fuse.Mount(mountdir, fuse.AllowOther())
+ c, err := fuse.Mount(mountdir, fuse.AllowOther(), fuse.AllowNonEmptyMount(), fuse.DefaultPermissions())
if err != nil {
return nil, nil, fmt.Errorf("cannot mount %q: %v", mountdir, err)
}
From the host machine, I used the following to "prepare" the local dirs:
umount /cloudsql
rm -rf /tmp/cloudsql /cloudsql
mkdir -p /tmp/cloudsql /cloudsql
chmod -R 777 /tmp/cloudsql
chmod -R 777 /cloudsql
Build the image:
docker build -f Dockerfile.buster --tag proxy-buster-local .
and run it:
docker run -it --rm --name proxy --user=root \
--mount type=bind,source=/tmp/credentials.json,target=/config/credentials.json \
--mount type=bind,source=/cloudsql,target=/cloudsql,bind-propagation=rshared \
--mount type=bind,source=/tmp/cloudsql,target=/tmp/cloudsql,bind-propagation=rshared \
--device=/dev/fuse \
--privileged proxy-buster-local \
/cloud_sql_proxy -fuse -fuse_tmp /tmp/cloudsql -dir /cloudsql -credential_file=/config/credentials.json
From the host machine I was able to connect:
mysql -S /cloudsql/<project>:<instance> -p$PASS -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 172889
Server version: 5.7.25-google-log (Google)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]>
Hey @jonthegimp, thanks for the additional information.
It looks like the only piece I was missing was including bind-propagation=rshared
for the volume mounts. I checked the other two options you specified (but that aren't needed to mount) as well:
I'm not sure either are particularly useful - the first might be, but it seems like the folder permissions already apply, and the per-file access can only be enabled after the link has been created (which seems to defeat the point of fuse). The second seems potentially dangerous, as it might wipe out a directory.
I opened #537 here to fix this issue. I'm still seeing that docker won't allow the proxy to clean up the volume through the bind for some reason, but don't know what the cause is. However this seems to be limited when running in the container and the proxy does cleanly start back up again with these changes, so I don't think it's a blocker.
@jonthegimp If you have time, I would appreciate if you could test #537 fixes the problem in your environment. If so we should be able to get it into the next release.
@kurtisvg , I had a little time to try this out - and found the following:
The containers of my deployment look like the following:
containers:
- image: percona:5.7
name: percona
env:
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "true"
volumeMounts:
- mountPath: /cloudsql
name: userconfig-fuse
mountPropagation: Bidirectional
- mountPath: /tmp/cloudsql
name: userconfig-fuse-tmp
securityContext:
privileged: true
- command:
- /cloud_sql_proxy
- -fuse
- -fuse_tmp=/tmp/cloudsql
- -dir=/cloudsql
- -credential_file=/secrets/cloudsql/credentials.json
image:<localrepo>/proxy-buster-537:latest
imagePullPolicy: Always
name: cloudsql-proxy
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /cloudsql
name: userconfig-fuse
mountPropagation: Bidirectional
- mountPath: /tmp/cloudsql
name: userconfig-fuse-tmp
- mountPath: /secrets/cloudsql
name: cloudsql-instance-credentials
readOnly: true
securityContext:
privileged: true
runAsGroup: 65532
runAsUser: 65532
As built, I encountered the following error:
Mounting /cloudsql...
mount helper error: fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
WARNING: Mount failed - attempting to unmount dir to resolve...%!(EXTRA string=/cloudsql)
Unmount failed: exit status 1: fusermount: entry for /cloudsql not found in /etc/mtab
mount helper error: fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
Could not start fuse directory at "/cloudsql": cannot mount "/cloudsql": fusermount: exit status 1
I had to add the following to the Docker.buster
file:
RUN apt-get update && apt-get install -y \
ca-certificates \
fuse
# Add the sed statement to uncomment the user_allow_other option
RUN sed -i 's/#user/user/g' /etc/fuse.conf
With the above, I was able to access the database from the percona
image as expected
It looks like we can do this by adding the fuse
group to the nonroot users as well - I opened #540 to do and will see if I can test this afternoon or tomorrow.
Ok, adding the fuse
group didn't work (seems to be some outdated documentation regarding its existence is the later versions of Debian).
I followed @jonthegimp's lead and used sed to replace the value in the config. I confirmed this allows fuse to work for both the buster and the alpine images.
Bug Description
This could be a documentation issue, as I am unable to reference a definitive guide on using the
-fuse
flag for the kubernetes client. I am coming up against various errors (similar to Issue:38), wherein the image is stating the "fusermount" is not executable.At this point, I am not sure if I am specifying the wrong flags, or if there is something greater going on. Any advise would be appreciated.
Example code
In this case, I am deploying a cloudsql proxy as a container, mounting
/cloudsql/fuse
as the directory in which I'd like the begin the socket path. This error also arises when using the "default"/cloudsql
, I am limited to using this alternative directory due to the parent chart.Upon deploying the below, I am seeing issues of the type:
Container yaml:
How to reproduce
Environment
./cloud_sql_proxy -version
): 1.17