Closed FSangouard closed 2 years ago
This shows that brick failed to start : Unable to start brick
I didn't even think to look there, thanks. I was able to move forward in my deployment. For the record, here's what I've done so far :
After reviewing the brick logs, the problem was with the permissions inside the brick directory. By default, the hidden directory .glusterfs
and some of its subdirectories (indices
, changelogs
, unlink
) are created with permissions 600
, which doesn't include execute permission. Without this permission, a non-root user cannot do much inside these directories, so I had to add execute permission on the .glusterfs
directory recursively (all subdirectories inside other than those 3 already have that permission, so it didn't change anything for them). I actually had to create the directories beforehand since volume startup couldn't create them.
After that, I was able to start the volume and mount it on a client, but creating a file failed because the user who created it on the client was different from the user running GlusterFS on the server.
I had to :
CAP_CHOWN
to the service's capabilities and restart the volume for GlusterFS to be able to change ownership of files created with a different userstorage.owner-uid
and storage.owner-gid
on the volumeI am currently able to CRUD files from the client as a different user than the one running GlusterFS. I think it looks good, but if anything of what I did seems suspicious to you, please let me know.
Hi @FSangouard - all looks very interesting. If you search for the code we have many places where '0600' is hard-coded into the sys_mkdir() calls or other calls. I think enumerating them and understanding what the gaps are is a great first step.
Yes I have seen the hardcoded '0600' when looking for a way to configure those default permissions, which is why I decided to manually create the directories with the correct permissions on the machines.
Unfortunately, I'm really not a C programmer, so even if I had the time, I would not be able to contribute a way to make this configurable.
I still want to do some more tests (like geo-replication), I will report here if I run in any troubles.
I'm hitting a road block right now with geo-replication, I'm not sure what the problem is. Every time I start a geo-replication session, it instantly becomes Faulty, and on the slaves I see the volume being mounted multiple times in a loop, until I stop the session.
Example (contents of /etc/mtab about a minute after session startup):
localhost:volume1 /var/mountbroker-root/user2006/mtpt-georep-AestaA fuse.glusterfs rw,relatime,user_id=2000,group_id=2000,allow_other,max_read=131072 0 0
localhost:volume1 /var/mountbroker-root/user2006/mtpt-georep-EAnKg0 fuse.glusterfs rw,relatime,user_id=2000,group_id=2000,allow_other,max_read=131072 0 0
localhost:volume1 /var/mountbroker-root/user2006/mtpt-georep-nn4Z5J fuse.glusterfs rw,relatime,user_id=2000,group_id=2000,allow_other,max_read=131072 0 0
localhost:volume1 /var/mountbroker-root/user2006/mtpt-georep-1JGbHN fuse.glusterfs rw,relatime,user_id=2000,group_id=2000,allow_other,max_read=131072 0 0
Before I reached this point, I first had to add CAP_DAC_OVERRIDE
to the capabilities of the glusterd service because the mountbroker service explicitly checks for uid 0 as owner and no write permission for group/others on the mountbroker root directory, so the only way for glusterd to start as a non-root user and still have access to this directory is to add that capability.
However, it appears that it is not enough to have a fully functionning geo-replication session. From what I've seen, there are 2 problems:
gf_fuse_unmount
in fuse-lib/mount.c :I tried setting user_allow_other
in /etc/fuse.conf
on the slaves and restarting the session, but it didn't change anything.
I have some doubts about the feasability of geo-replication with glusterd running as non-root, because even if we resolve the first problem, I don't see a way around the second problem.
Here are the logs in DEBUG:
Master:
In /var/log/glusterfs/geo-replication/volume1_<ip_slave1>_volume1/gsyncd.log
:
Slave:
In /var/log/glusterfs/geo-replication-slaves/volume1_<ip_slave1>_volume1/gsyncd.log
:
In /var/log/glusterfs/geo-replication-slaves/volume1_<ip_slave1>_volume1/mnt-<master1>-applis-apsu-data-glusterfs-volume1-brick1-brick.log
:
If anyone has any pointer or idea, I'd be glad to hear them.
After some more digging, I'm under the impression that the first problem is actually a consequence of the second problem.
What makes me think that is that when I run the gsyncd
command directly on the slave as the georep
user, I get the following error:
/usr/bin/python2 /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py slave volume1 georep@<ip_slave1>::volume1 --master-node <master1> --master-node-id d79016f4-d408-4ff4-b957-0039361685f0 --master-brick /applis/apsu/data/glusterfs/volume1/brick1/brick --local-node <slave1> --local-node-id 7b712811-dbff-4ff6-85f3-5980824e4bc5 --slave-timeout 120 --slave-log-level DEBUG --slave-gluster-log-level DEBUG --slave-gluster-command-dir /usr/sbin --master-dist-count 1
failure: cleaning up temp mountpoint /var/mountbroker-root/mb_hive/mnt19ABQh failed with status 1
failed with GsyncdError: cleaning up temp mountpoint /var/mountbroker-root/mb_hive/mnt19ABQh failed with status 1.
But if I run the same command as root
, it runs fine and I don't see any temporary mounts appear under /var/mountbroker-root
, so I suppose these temporary mounts are created and deleted at regular intervals as part of the normal operations of the process, and since the non-root user cannot unmount them, it leaves us with more and more stale mounts, ie problem 1.
I'm still no closer to a workaround or a resolution, so all suggestions are welcome.
Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.
Any further updates on this? We received vulnerability alerts in our system due to this. :-(
I still haven't managed to combine georeplication and running as non-root fully.
I managed to at least have the brick processes run as non-root while the glusterd
daemon and the rest run as root, by inserting a wrapper script written in python around the glusterfsd
binary in /usr/sbin
. This wrapper script drops root privileges but retains the necessary capabilities before running the actual program. If you want some more details, let me know.
That's not completely satisfactory but that's all I could do. If anyone else has anything to contribute, I'm still interested.
Update:
I managed to have georeplication working with only one process running as root. I dropped my wrapper script and instead I configured glusterd to run as non-root like before, so the brick processes, the self-heal daemons etc. are all running as non-root. And to avoid the problem I explained previously (stale mounts), I run a kind of "clone" of the glusterd daemon as root, which will be in charge of mounting/unmounting the local mounts for the georeplication sessions.
To do that, I simply created a copy of /etc/glusterfs/glusterd.vol
as /etc/glusterfs/georepd.vol
inside which I changed the bind address to localhost and the socket file to /var/run/gluster/georepd.socket
, then I run the command glusterd --volfile /etc/glusterfs/georepd.vol
to start it (turning that into a service is easy). Knowing that in my glusterd.vol
file the bind address is the network interface of the machine and the socket file is /var/run/gluster/glusterd.socket
, this allows both daemons to run in parallel without conflict. And since the gsyncd
daemon uses localhost to contact the management daemon, it will automatically send the mount/unmount requests to the one running as root.
That way, the only process running as root is only listening on localhost, so that should be ok with our security team. It's not perfect but it's satisfactory.
The only thing that bothers me is that the gluster-mountbroker
command is hardcoded to modify /etc/glusterfs/glusterd.vol
, so if I want to make it act upon my new georepd.vol
file, I must edit the GLUSTERD_VOLFILE
variable in /usr/libexec/glusterfs/peer_mountbroker.py
directly, which is not ideal. I could simply add the relevant options for georeplication manually, but I would like to preserve the command's functionality for the benefit of our Ops team.
Again, if anyone has a better solution, I'm all ears.
Description of problem: I'm trying to make GlusterFS run as a non-root user (not just Geo-replication, but also the glusterd daemon, the glusterfsd processes handling the volumes, everything). I managed to get glusterd running as non-root and to create a volume with the CLI using sudo. However, I am stuck when I try to start the volume. The error doesn't tell me much.
Is this possible at all ? Or is it not supported ?
The exact command to reproduce the issue:
sudo gluster --glusterd-sock=/run/gluster/glusterd.socket volume start volume1
The full output of the command that failed:
volume start: volume1: failed: Commit failed on localhost. Please check log file for details.
Expected results:
volume start: volume1: success
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:- The output of the
gluster volume heal
command:- Provide logs present on following locations of client and server nodes - /var/log/glusterfs
In
glusterd.log
:More logs can be provided if necessary.
- Is there any crash ? Provide the backtrace and coredump No crash
Additional info:
- The operating system / glusterfs version:
CentOS 7.8 GlusterFS 9.4