sssd should run under unprivileged user

sssd-bot commented 4 years ago

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/2370

Created at 2014-06-27 17:36:06 by jhrozek
Closed as Fixed
Assigned to jhrozek
Associated bugzillas
- https://bugzilla.redhat.com/show_bug.cgi?id=1113783

Ticket was cloned from Red Hat Bugzilla (product Red Hat Enterprise Linux 7): Bug 1113783

Please note that this Bug is private and may not be accessible as it contains confidential Red Hat customer information.

Description of problem:
sssd is currently running as root and should be reimplemented to run under an
unprivileged user. 

Version-Release number of selected component (if applicable):
sssd-1.11.2-65

How reproducible:
100%

Steps to Reproduce:
1. systemctl sssd start
2. ps axZ | grep sssd

Actual results:
sssd and all children running as root

Expected results:
sssd and all children running under unprivileged user.

Additional info:

Comments

Comment from dpal at 2014-07-03 16:02:10

Fields changed

blockedby: => blocking: => changelog: => coverity: => design: => design_review: => 0 feature_milestone: => fedora_test_page: => milestone: NEEDS_TRIAGE => SSSD 1.12.1 review: True => 0 selected: => testsupdated: => 0

Comment from jhrozek at 2014-07-08 10:11:36

Fields changed

priority: major => blocker

Comment from jhrozek at 2014-07-29 17:23:19

Fields changed

design: => https://fedorahosted.org/sssd/wiki/DesignDocs/NotRootSSSD

Comment from jhrozek at 2014-07-31 13:32:11

Fields changed

owner: somebody => jhrozek status: new => assigned

Comment from jhrozek at 2014-09-08 20:08:41

Mass-moving all tickets that didn't make 1.12.1 into 1.12.2

milestone: SSSD 1.12.1 => SSSD 1.12.2

Comment from jhrozek at 2014-09-30 19:06:10

We need to do a release as requested by downstream. Moving tickets that are not fixed already or very close to acking to 1.12.3

milestone: SSSD 1.12.2 => SSSD 1.12.3

Comment from jhrozek at 2014-10-20 21:50:00

First round of patches:

2689efa614826d45cab60ea1186d44b8bdd243ad
940c94bc9a29165987cb9d3f71c4a4ec76e7a1fc
e718e60607149079e5c8ddc32bdd5c90b1c7a364
3fd66df4813d1410c1a6187c80e3a23395b14aed
4546e283498ffe2511cb566b9159714c671e326b
ac40d2f2b2b2fc35c95389f5e28febd580bd2b7a

mark: => 0

Comment from jhrozek at 2014-10-30 12:12:36

Fields changed

patch: 0 => 1

Comment from jhrozek at 2014-11-05 20:12:26

ldap_child and krb5_child changes:

f3a25949de81f80c136bb073e4a8f504b080c20c
77b13371c87702aee3f858f6b2b73826cf5a01bd
06f10b2a0ebb26f2460cd445f8040e9205de7500
936940720b1b0e701a2317abc4c2d05a78338f33
5eef3da14cb34e4cb6356f0b291c066db946f936
0348c74bad010d35f92400c749a7acc2fea8b2cb
45414c12aa933a33d9a635cc212c448c858c6bab

Comment from jhrozek at 2014-11-18 20:44:12

More Kerberos changes:

35b4b217fa2b91bfc8d58c47024faf41c95fc807
2745b0156f12df7a7eb93d57716233243658e4d9
7c5cd2e7711621af9163a41393e88896a91ac33b
45aeb924ec3ac448bb8d174a5cc061ed98b147c7
476b78b3f66abc7a0f805154ea1a29f54628224a
a60f4bb6b321298eb4d1c1c33d1897049a83d357

Comment from jhrozek at 2014-11-18 20:55:22

Build fixes:

d167039ec78f93828ba65366d86a1ee7774aed3e
0a039d574ed0e91d939efd2c0975f6a425942afd
f9ac9aa5760210b8bbd3d4e667dff5f101f99d62

Comment from jhrozek at 2014-11-18 20:58:03

sssd_be privilege drop patch:

10d571677bf11c8b2c357026f24de40cae3094f5

Comment from jhrozek at 2014-11-25 20:02:18

Most of the work is done, so I'm closing this ticket. There are some additional enhancements tracked by individual tickets. See e.g. the design page for more details.

resolution: => fixed status: assigned => closed

Comment from jhrozek at 2017-02-24 14:35:03

Metadata Update from @jhrozek:

Issue assigned to jhrozek
Issue marked as depending on: #2395
Issue set to the milestone: SSSD 1.12.3

cgwalters commented 3 years ago

Copy-pasting from https://github.com/coreos/coreos-assembler/pull/2046#issuecomment-780068511

suid binaries are really just a dangerous idea in general. They're a huge attack surface. For this use case instead, the monitor process can expose an internal API (e.g. over a socketpair) that runs something privileged and passes its output to the requesting unprivileged process.

adelton commented 3 years ago

I guess some description of the role and architecture of the monitor process would be useful.

Is it there to gather and provide some data from the individual SSSD processes (in which case, does it need to be run as root at all?) or is it there to for example watch for failed SSSD processes and restart them (in which case, systemd which already runs as root and with full capabilities maybe already provides some mechanism to do that)?

alexey-tikhonov commented 3 years ago

I guess some description of the role and architecture of the monitor process would be useful.

Is it there to gather and provide some data from the individual SSSD processes (in which case, does it need to be run as root at all?) or is it there to for example watch for failed SSSD processes and restart them (in which case, systemd which already runs as root and with full capabilities maybe already provides some mechanism to do that)?

Mostly the latter.

And yes, this is a long standing idea to get rid of monitor and rely on systemd entirely.

So from my point of view, extending monitor is not a way to get rid of suid bits form *_child helpers.

cgwalters commented 3 years ago

So from my point of view, extending monitor is not a way to get rid of suid bits form *_child helpers.

The way I think of this is: Yes you're running more code as non-root, but by adding suid binaries, you have increased the risk to the system as a whole.

A separate privileged process that communicates over e.g. private socketpair() with other children is a well-used Unix design pattern. It doesn't actually mean that process has to be the parent (or monitor) of the other processes.

In other words, you can run multiple systemd units - one of which is e.g. a socket-activated sssd-priv-helper.service that exposes a Unix domain socket that you authenticate by e.g. validating the uid of the connecting process is sssd.

alexey-tikhonov commented 3 years ago

The way I think of this is: Yes you're running more code as non-root, but by adding suid binaries, you have increased the risk to the system as a whole.

Might be. Overall `SSSD running as non-root user" isn't "production ready" yet, that's clear.

I think we will fix attrs (i.e. get rid of suid) for a case where sssd is built with --with-sssd-user=root (this is Fedora for example) shortly.

Proper solution is more involved though.

cgwalters commented 3 years ago

I started a discussion on this at https://lists.freedesktop.org/archives/systemd-devel/2021-February/046103.html

alexey-tikhonov commented 3 years ago

I think we will fix attrs (i.e. get rid of suid) for a case where sssd is built with --with-sssd-user=root (this is Fedora for example) shortly.

https://github.com/SSSD/sssd/pull/5510

alexey-tikhonov commented 7 months ago

So from my point of view, extending monitor is not a way to get rid of suid bits form *_child helpers.

The way I think of this is: Yes you're running more code as non-root, but by adding suid binaries, you have increased the risk to the system as a whole.

JFTR:

Fedora is going to be built '--with-sssd-user=sssd' starting, probably, F-41 (sssd-2.10)
SUID bit was replaced with fine grained file capabilities: https://github.com/SSSD/sssd/blob/0d5e8f11714e8e6cc0ad28e03fecf0f5732528b3/contrib/sssd.spec.in#L20

binaries in question are:

-rwxr-x---. 1 root sssd 141K Jan 21  2021 krb5_child
-rwxr-x---. 1 root sssd  52K Jan 21  2021 ldap_child
-rwxr-x---. 1 root sssd  32K Jan 21  2021 selinux_child

-- so executable only by 'root' and service user thus I don't think it "increases the risk to the system as a whole".

cgwalters commented 7 months ago

-- so executable only by 'root'

Yes, that's a significant mitigation indeed. But still, I feel pretty confident will show up in scans/audits and will cause you more pain down the line than the "privileged monitor process" approach or another similar one.

Maybe e.g. running with PrivateTmp= or so, and have a (privileged) ExecStartPre=- run code that copies the binaries into that namespace, and only makes them suid there.

BTW if you're doing new work to run unprivileged I highly recommend using DynamicUser=yes instead of having a fixed uid done via e.g. %post useradd or so. It has numerous advantages, including working much better with image-based update systems and avoiding uid/gid drift in those scenarios.

adelton commented 7 months ago

BTW if you're doing new work to run unprivileged I highly recommend using DynamicUser=yes instead of having a fixed uid done via e.g. %post useradd or so. It has numerous advantages, including working much better with image-based update systems and avoiding uid/gid drift in those scenarios.

I'm of quite the opposite opinion. By using DynamicUser=yes, what you get is uncertainty and the complexity of chowns logic possibly needed upon startup.

Can you elaborate what the benefits for the image-based systems (or services running in containers) would be exactly?

alexey-tikhonov commented 7 months ago

-- so executable only by 'root'

Yes, that's a significant mitigation indeed. But still, I feel pretty confident will show up in scans/audits and will cause you more pain down the line

Do you have an example where this could really increase risk / attack surface? It looks for me as "running privileged process all the time" vs "running privileged helper occasionally (by restricted users only)". So far I don't understand how the latter could be worse.

Additional note: a set of file capabilities (that replaced full blown SUID bit) is going to be restricted further, and other measures to constrain helpers will be considered (like selinux policies).

Maybe e.g. running with PrivateTmp= or so, and have a (privileged) ExecStartPre=- run code that copies the binaries into that namespace, and only makes them suid there.

Interesting option. But if this creates a new namespace, then will privileged helper run within this namespace have privileges in the host namespace? I'm not sure what is "file system namespace".

cgwalters commented 7 months ago

I'm of quite the opposite opinion. By using DynamicUser=yes, what you get is uncertainty and the complexity of chowns logic possibly needed upon startup.

What is uncertain?

the complexity of chowns logic possibly needed upon startup.

It's pretty well optimized.

The thing is...ah yes, I see it is a floating user/group: https://src.fedoraproject.org/rpms/sssd/blob/rawhide/f/sssd.spec#_992

For image based updates it is really, really helpful to minimize the number of dynamic/floating users that are part of lower levels in the OS.

Please please consider using DynamicUser=yes as it seems nearly ideal for this use case as sssd is just a cache.

alexey-tikhonov commented 7 months ago

The thing is...ah yes, I see it is a floating user/group:

JFTR, it will use %sysusers_create_compat starting f-41. But that's probably doesn't change much.

Please please consider using DynamicUser=yes as it seems nearly ideal for this use case as sssd is just a cache.

And where this cache should be kept / what should be ownership and ACL? So that it would be persistent between service restarts / machine reboots?

cgwalters commented 7 months ago

And where this cache should be kept / what should be ownership and ACL?

Use StateDirectory=sssd in the unit - when used in combination with DynamicUser=yes, this ensures ownership is handled, and it persists across reboots as /var does in general.

travier commented 7 months ago

JFTR, it will use %sysusers_create_compat starting f-41. But that's probably doesn't change much.

Yes please 🙂 https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/#_allocation_strategies

travier commented 7 months ago

%global child_capabilities cap_chown,cap_dac_override,cap_setuid,cap_setgid=ep

That's a bit better but that's still equivalent to full root in most cases: https://grsecurity.net/false_boundaries_and_arbitrary_code_execution

Are any users expected to be part of the sssd group or is it only used for the daemon?

alexey-tikhonov commented 7 months ago

%global child_capabilities cap_chown,cap_dac_override,cap_setuid,cap_setgid=ep

That's a bit better but that's still equivalent to full root in most cases: https://grsecurity.net/false_boundaries_and_arbitrary_code_execution

But should be better than to run entire daemon unrestricted all the time (as it's the case now). And as I mentioned, set will be restricted further.

Are any users expected to be part of the sssd group or is it only used for the daemon?

The latter.

cgwalters commented 7 months ago

I'm of quite the opposite opinion. By using DynamicUser=yes, what you get is uncertainty and the complexity of chowns logic possibly needed upon startup.

Sorry to really belabor this but given sssd's critical architectural role a central point around user management...what is done here partially sets a "tone" and example for overall system architecture around what other services should do. If you have concerns about DynamicUser=yes - please be concrete.

Tangentially related to this I just added a large blurb on this over in https://containers.github.io/bootc/building/users-and-groups.html and it touches on the fact that the "classic RPM %pre floating allocation" pattern will generally cause uid/gid drift for image based systems unless it's manually fixated. DynamicUser=yes is again just significantly better from that PoV.

adelton commented 7 months ago

SSSD is (among other functions) a name service so it may be a source of uids for processes. As such it should be very certain about its own identity and uid clashes should be ruled out. The fact that the daemon with DynamicUser=yes can run as the same uid as regular user processes compromises the whole uid resolution, and it is a nightmare from the auditing point of view. Systemd cannot prevent the daemon from starting as such uid because at the time of the daemon start, the remote uids are not yet being resolved by SSSD so systemd has no way of knowing that there might be conflicting uids.

I dread the idea of a system where the uid ranges are not reasonably separated / isolated. Ability to audit operations and their authors is one of the strict SFRs of Common Criteria certification, among others.

The DynamicUser=yes explicitly uses uids in the range 61184–65519, so above the SYS_UID_MAX logic of login.defs. To me that is a clear indication that this mechanism should not be used for system services and system daemons.

I simply fail to see what value DynamicUser=yes provides to SSSD, or in general to daemons packaged in OSes. I don't see what benefit this would bring over the soft static allocation (https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/#_soft_static_allocation). Just pick a uid a go with that. No drifts, not issues upon upgrades, no chowns. Uid from a range designated for system accounts.

For example, Apache's uid 48 has been stable for ages on rpm-based systems. I don't see a reason why the same couldn't be done for SSSD.

By the way, it seems FPC has already acked the soft static allocation for SSSD in https://pagure.io/packaging-committee/issue/570 -- we just never followed up with the actual allocation in uidgid.

cgwalters commented 7 months ago

The fact that the daemon with DynamicUser=yes can run as the same uid as regular user processes compromises the whole uid resolution,

One of two things is true: DynamicUser=yes works or it doesn't. If you're arguing that there are issues, then either those issues need to be fixed, or it can't be used in general (right?).

so systemd has no way of knowing that there might be conflicting uids. [...] The DynamicUser=yes explicitly uses uids in the range 61184–65519, so above the SYS_UID_MAX logic of login.defs. To me that is a clear indication that this mechanism should not be used for system services and system daemons.

I think the rationale for that is exactly that it's distinct from system allocations and human users. What else would it do?

I simply fail to see what value DynamicUser=yes provides to SSSD, or in general to daemons packaged in OSes. I don't see what benefit this would bring over the soft static allocation

It doesn't scale to have a centralized registry for uids, and there's only so much space under 1000. (It also doesn't scale to have a centralized registry for software, hence containers...)

For example, Apache's uid 48 has been stable for ages on rpm-based systems. I don't see a reason why the same couldn't be done for SSSD. By the way, it seems FPC has already acked the soft static allocation for SSSD

Yes, that's obviously fine, I won't argue against it. But still, there will be a big need to have something like DynamicUser=yes in the general case because of the centralization problem above. So if you have concerns about potential overlaps or other design issues, I don't see a way around fixing that in general.

adelton commented 7 months ago

One of two things is true: DynamicUser=yes works or it doesn't. If you're arguing that there are issues, then either those issues need to be fixed, or it can't be used in general (right?).

Why do you view it as black or white? The DynamicUser=yes feature likely works as described. To me it seems like a quick way to un-privilege your services. (I wanted to say quick'n'dirty but I didn't want to start another round of justifying my words -- this is not a review of the DynamicUser=yes feature. :-) )

That does not mean everything should be migrated to it en masse. Even https://fedoraproject.org/wiki/Changes/SystemdSecurityHardening does not tout it as a one-size-fits-all security mechanism:

We have to choose between DynamicUser=yes or User if either is feasible for the service to use.

Different types of workloads and services have different security ramifications. Risks that you might tolerate with some user service with no impact/side-effect on the rest of the system is vastly different from risks that you can tolerate from a daemon that is a basis of identity, authentication, and authorization in the system. Note I also have a problem with classifying SSSD as "just a cache".

I think the rationale for that is exactly that it's distinct from system allocations and human users. What else would it do?

I don't necessarily blame it. It does not mean I want to see SSSD use it.

It doesn't scale to have a centralized registry for uids, and there's only so much space under 1000. (It also doesn't scale to have a centralized registry for software, hence containers...)

I've heard the argument about 1000 for many years. And yet there are still tons of gaps in uidgid. The current soft static allocation process actually seems like a fairly reasonable gatekeeper.

Yes, that's obviously fine, I won't argue against it. But still, there will be a big need to have something like DynamicUser=yes in the general case because of the centralization problem above. So if you have concerns about potential overlaps or other design issues, I don't see a way around fixing that in general.

There likely isn't, there are inherent risks and uids are hard. (And even if you get uids in sync for example across systems, your subuids might be out of sync -- just recently we had a containerized FreeIPA user/admin hit that.)

If we are talking in general, the fact that systemd does not seem to leave any auditing trail about the uid it used for the DynamicUser=yes service is scary and should be No. 1 thing to address. I wouldn't like to be in a position to explain to evaluators why we have no idea where a given process and its uid came from or why there are chown operations from one "random" uid to another audited.

SSSD / sssd

sssd should run under unprivileged user #3412

Comments