use PID FD if available from SO_PEERPIDFD, and return it via GetConnectionCredentials()

bluca commented 1 year ago

This wires up support for PID FDs on Linux, which allow to pin a process by file descriptor. PIDs can be reused, and attackers can thus impersonate other processes. The new SO_PEERPIDFD socket option lets us get an FD that we know for sure pins the original process, so we return it as ProcessFD in GetConnectionCredentials() so that clients can use it as well. If the new call is not available, pin the process manually by pid, so that we can still improve the situation a little by resolving the PID on the fly for internal usage, but in this case avoid returning the FD to clients, as it's not deemed safe enough.

dbus-daemon and spec PR: https://gitlab.freedesktop.org/dbus/dbus/-/merge_requests/398 polkit PR to use the new methods: https://gitlab.freedesktop.org/polkit/polkit/-/merge_requests/154 kernel patches to add SO_PEERPIDFD: https://lore.kernel.org/lkml/20230327-pidfd-file-api-v1-3-5c0e9a3158e4@kernel.org/T/#m159b013707a718dadde07e76f72f04153392a9cd

bluca commented 1 year ago

Thanks a lot for implementing this! A bunch of comments inline (most of which I can also fix when applying the patches). Let me know what you think!

I guess the idea is to wait for SO_PEERPIDFD to land? It is valuable without it, due to the fallback-pidfd. However, I wonder whether we really want the fallback?

Yeah, I opened everything as draft as SO_PEERPIDFD is still in review - I think it would be valuable to start tracking by pid FD beforehand, as it makes things strictly better, there's no downsides to it. It's up to you if you want me to split this PR and merge the first parts earlier, or if you prefer to wait until the option is in net-next or an rc, let me know what you prefer.

bluca commented 1 year ago

@dvdhrm would you like me to split off some of the first commits in a separate PR?

dvdhrm commented 1 year ago

@dvdhrm would you like me to split off some of the first commits in a separate PR?

Nope. I am still on parental leave, that's why non bugfixes take some time ;) I will review in detail next few weeks

bluca commented 1 year ago

@dvdhrm would you like me to split off some of the first commits in a separate PR?

Nope. I am still on parental leave, that's why non bugfixes take some time ;) I will review in detail next few weeks

Got it, no worries!

bluca commented 1 year ago

A bunch of comments on the util-patches. I fixed all of them myself. Let me know if you disagree or see issues with my approach.

I extracted proc_read() and proc_field() from your patches, adapted them to be a bit more generic and added test-proc.c to verify their behavior.

I pushed this out to main. I will try to tackle your remaining patches next.

Looks all good, rebased and pushed on top of your changes.

Note that the D-Bus spec changes are not yet merged, the maintainer is waiting for the kernel API to be released (they are currently merged in Linus' tree so they will be included in the upcoming 6.5-rc1)

bluca commented 12 months ago

A bunch of comments on the util-patches. I fixed all of them myself. Let me know if you disagree or see issues with my approach. I extracted proc_read() and proc_field() from your patches, adapted them to be a bit more generic and added test-proc.c to verify their behavior. I pushed this out to main. I will try to tackle your remaining patches next.

Looks all good, rebased and pushed on top of your changes.

Note that the D-Bus spec changes are not yet merged, the maintainer is waiting for the kernel API to be released (they are currently merged in Linus' tree so they will be included in the upcoming 6.5-rc1)

The spec change for GetConnectionCredentials was approved in https://gitlab.freedesktop.org/dbus/dbus/-/merge_requests/420 so I've undrafted this, I think we are good to go. I've removed GetConnectionUnixProcessFD as Simon doesn't like it so it will not be added to the spec, at least for now.

bluca commented 12 months ago

Spec change is now merged: https://gitlab.freedesktop.org/dbus/dbus/-/merge_requests/420

bluca commented 12 months ago

@dvdhrm anything else you'd like me to do here?

teg commented 12 months ago

I like this a lot! I'm late to the party but have some questions about the intended behavior.

My assumption is that you intend to support three cases:

kernels that support SCM_PEERPIDFD, this is what's now tracked as "trusted"
kernels that only support pidfd_open(), which is tracked by pidfd >= 0
kernels that support neither, which will fall back to using the pinned pid.

Both SCM_PEERPIDFD and pidfd_open() can return -ESRCH. What is the intended behavior in this case?

Is there ever a scenario in one of the two first supported cases above where we have a connected peer with pidfd < 0? If so, what's the intended behavior of the GetCreds* APIs?

bluca commented 12 months ago

That means the process is already gone, and (if I got it right) it will be treated as such - return 'service not found' or so

dvdhrm commented 12 months ago

(lets move this out of the code-review into general discussion:)

You currently omit the peer from the accounting-dump just because its original PID is no longer valid. I think you should still include the peer, since it is still operational and we want to be able to debug its resource usage even if its original PID is invalid.
I honestly believe GetConnectionCredentials() should return all valid information, even if the original PID is outdated. But even if we decide to drop that information, I still think it should return success, rather than PEER_NOT_FOUND. Because the peer is still operational.
Pidfds do not prevent the PID from being recycled, right? They just pin the struct pid, but as soon as that is unhooked from the hashtables, the pidnr can be recycled. So this means GetConnectionCredentials() returns PEER_NOT_FOUND for peers with invalid PID, rather than returning the PIDFD? I think it is still useful to get a PIDFD, even if the original pid is gone, isn't it?

bluca commented 12 months ago

(lets move this out of the code-review into general discussion:)

You currently omit the peer from the accounting-dump just because its original PID is no longer valid. I think you should still include the peer, since it is still operational and we want to be able to debug its resource usage even if its original PID is invalid.

I honestly believe GetConnectionCredentials() should return all valid information, even if the original PID is outdated. But even if we decide to drop that information, I still think it should return success, rather than PEER_NOT_FOUND. Because the peer is still operational.

Pidfds do not prevent the PID from being recycled, right? They just pin the struct pid, but as soon as that is unhooked from the hashtables, the pidnr can be recycled. So this means GetConnectionCredentials() returns PEER_NOT_FOUND for peers with invalid PID, rather than returning the PIDFD? I think it is still useful to get a PIDFD, even if the original pid is gone, isn't it?

If the actual process is gone, then why would we consider the peer as operational? When querying for this information from polkit, I need to make a decision on whether to authorize some action or not. If I get stale data, then I cannot do that, because I have no idea who I am actually going to authorize. This is the basis of this whole effort - reliably track the process, and if it's gone, say it's gone rather than trying to hide that fact and give out stale info.

teg commented 12 months ago

All the creds are the creds at the time the peer opened the dbus socket. The socket could be passed to a different process and the original owner could exit, in which case the peer is operational even though the process that created it is gone.

Most of the creds still make sense, as it is pretty common to rely on this semantic. One could argue that a stale pid is useless as the only thing you could do with it is compare for equality, which makes no sense for pids due to recycling. I suppose you could still use it for logging purposes, though that seems tenuous.

Either way we must treat a peer without a pid as valid - unless I'm totally confused :)

In polkit you can verify that the process is still around by using waitid(), so I think there shouldn't be a problem.

teg commented 12 months ago

Hmm, if we decide to drop stale pid's, we risk breaking clients that don't handle gracefully that it is missing (say they include it in their log messages and now would crash). Pretty sure we can't drop it, but maybe we could set it to a known invalid value?

dvdhrm commented 12 months ago

(lets move this out of the code-review into general discussion:) [...] If the actual process is gone, then why would we consider the peer as operational? When querying for this information from polkit, I need to make a decision on whether to authorize some action or not. If I get stale data, then I cannot do that, because I have no idea who I am actually going to authorize. This is the basis of this whole effort - reliably track the process, and if it's gone, say it's gone rather than trying to hide that fact and give out stale info.

Yes, you are all correct on that one. But I think you need to look at this from a different perspective than polkit. We have to remember that a dbus-connection can be passed down to child processes (which actually happens with fork-style daemons). And those connections will continue to operate on the bus perfectly fine. We cannot break this setup.

Yes, polkit will not be able to serve requests of such connections based on the PID. But this is ok. There is lots of dbus traffic that does not care for polkit. So we must ensure that these work as usual. And this means if there is a dbus-client that uses GetConnectionCredentials for logging, then we must continue to serve them this information. And more importantly, we must consider such connections in our debug-interfaces (e.g., accounting-dumps), because those connections can still consume resources, and we want to be able to diagnose this.

I think we can safely try to omit pids from credentials, if the pid vanished, and see whether it breaks things (as @teg mentioned). A safer alternative (which we can also fall back to later) would be to return (u16)-1 (or whatever systemd reserves for invalid uids; I forgot). And as the last resort, we can return 0.

However, I am quite reluctant to stop sending other credentials. The credentials of dbus connections follow a capability-based model, they represent a capability the connection held at one point and is free to retain. D-bus credentials do not represent any ambient capabilities a connection might hold at the time of query. Hence, I think it is the right approach to keep sending the credentials other than PID.

Or am I missing something here? Maybe I am looking at this from the wrong angle.

bluca commented 12 months ago

For my use case I think I can make it work with just the pidfd, so I've changed the implementation to leave everything else alone, and just store the fd from SO_PEERPIDFD if available, and return nothing otherwise. I've removed the resolving of the pid on the fly and everything else.

I think it would be nicer to move to a model where only actual, accurate data is returned, and if a peer is gone, is gone. But it's outside of my use case, so I'll leave it out, and for you to decide as a potential future improvement.

dvdhrm commented 12 months ago

I cherry-picked the sockopt_get_peerpidfd() function and adapted it to use a proper error-domain. Then I pushed tests that verify the behavior.

I have tried to investigate what happens for processes that are gone, but the kernel code is rather weird. It is not clear to me whether stale tasks are left linked in the pid-hlists or not. But I wouldn't be surprised if they are, even if their task_struct is gone.

Regardless, I have not been able to compile a 6.5 kernel yet, so I want to wait merging the further patches before we can verify the behavior of SO_PEERPIDFD. I have also messaged @brauner to verify what the intented behavior is.

Btw., similarly, it is currently undocumented what fdinfo of a pidfd returns if the processes are gone. I think -1, but this needs separate verification. I hope I can do that on Monday.

bluca commented 12 months ago

I cherry-picked the sockopt_get_peerpidfd() function and adapted it to use a proper error-domain. Then I pushed tests that verify the behavior.

That's quite different from how other functions in the same file work, I find it a bit confusing, but ok, I'll rebase and push. But the comment is wrong: ESRCH is returned when the process is gone, see: https://github.com/torvalds/linux/blob/master/net/core/sock.c#L1780

I have tried to investigate what happens for processes that are gone, but the kernel code is rather weird. It is not clear to me whether stale tasks are left linked in the pid-hlists or not. But I wouldn't be surprised if they are, even if their task_struct is gone.

Regardless, I have not been able to compile a 6.5 kernel yet, so I want to wait merging the further patches before we can verify the behavior of SO_PEERPIDFD. I have also messaged @brauner to verify what the intented behavior is.

Btw., similarly, it is currently undocumented what fdinfo of a pidfd returns if the processes are gone. I think -1, but this needs separate verification. I hope I can do that on Monday.

My understanding is that if the process is gone but not reaped you get a pidfd that resolves to -1, and after it's reaped you get ESRCH from the syscall.

mihalicyn commented 12 months ago

My understanding is that if the process is gone but not reaped you get a pidfd that resolves to -1, and after it's reaped you get ESRCH from the syscall.

I've worked on this kernel feature together with Luca and Christian. Yes, this is correct behavior description with small remark, not ESRCH but EINVAL (from pidfd_prepare). ESRCH is returned when sk->sk_peer_pid == NULL and this means that this socket has no peer (not connected). pidfd_prepare returns EINVAL when !pid_has_task(pid, PIDTYPE_TGID), it means that this struct pid is no longer hashed. I.e. socket was connected with some peer but now this peer process died.

Btw., similarly, it is currently undocumented what fdinfo of a pidfd returns if the processes are gone. I think -1, but this needs separate verification. I hope I can do that on Monday.

yes, in this case -1 will be shown in fdinfo

dvdhrm commented 12 months ago

I cherry-picked the sockopt_get_peerpidfd() function and adapted it to use a proper error-domain. Then I pushed tests that verify the behavior.

That's quite different from how other functions in the same file work, I find it a bit confusing, but ok, I'll rebase and push.

That's because none of the other sockopt_* functions have any failure situations that we want to handle. Any failures in their code are fatal. See util/error.c if you are interested in details how we handle errors in dbus-broker.

But in short: Never match on negative errors, except for calls from outside dbus-broker. And those must always be converted into our own error-domain (i.e., positive errors), or fatal.

But the comment is wrong: ESRCH is returned when the process is gone, see: https://github.com/torvalds/linux/blob/master/net/core/sock.c#L1780

No. ESRCH is when get_pid() returns NULL, and this only happens if sk_peer_pid is NULL, which only happens for non-unix sockets or before the unix socket is listened/connected.

dvdhrm commented 12 months ago

My understanding is that if the process is gone but not reaped you get a pidfd that resolves to -1, and after it's reaped you get ESRCH from the syscall.

I've worked on this kernel feature together with Luca and Christian. Yes, this is correct behavior description with small remark, not ESRCH but EINVAL (from pidfd_prepare). ESRCH is returned when sk->sk_peer_pid == NULL and this means that this socket has no peer (not connected). pidfd_prepare returns EINVAL when !pid_has_task(pid, PIDTYPE_TGID), it means that this struct pid is no longer hashed. I.e. socket was connected with some peer but now this peer process died.

Yes, this is what I assumed in the review above. Can you explain why this behavior was chosen?

So now everyone needs to deal with SO_PEERPIDFD returning a pidfd which then yields -1, because the process was reaped immediately after SO_PEERPIDFD. AND everyone must be able to deal with EINVAL for the exact same scenario, but just slightly too slow (or fast) querying the socket?

Why not always return a pidfd and let the caller have only a single pid-resolution?

I have to admit I do not like that everyone now has to deal with two options, one were the pidfd cannot be obtained, and one were the pidfd yields -1, but both represent the same scenario. This propagates through dbus-apis now, because we cannot work around it, and it propagates from the client-side further to possible wrapper APIs.

Is it too late to change it? Can we just use __pidfd_prepare() instead of pidfd_prepare() in the kernel?

mihalicyn commented 12 months ago

Yes, this is what I assumed in the review above. Can you explain why this behavior was chosen?

I've tried to go this way and it was rejected. https://lore.kernel.org/all/20230317090835.hz4lbm3pmvvmt2fs@wittgenstein/

mihalicyn commented 11 months ago

@dvdhrm David, Christian says that you want to change something about this API in the kernel. Just wanted to let you know that I'm happy to help with that.

bluca commented 11 months ago

Regardless, I have not been able to compile a 6.5 kernel yet, so I want to wait merging the further patches before we can verify the behavior of SO_PEERPIDFD.

btw if you want to test this end-to-end Debian experimental has the 6.5 kernel now, so it can be installed via the package linux-image-6.5.0-0-amd64-unsigned. I just tested this and it's working as expected.

bluca commented 11 months ago

Regardless, I have not been able to compile a 6.5 kernel yet, so I want to wait merging the further patches before we can verify the behavior of SO_PEERPIDFD.

btw if you want to test this end-to-end Debian experimental has the 6.5 kernel now, so it can be installed via the package linux-image-6.5.0-0-amd64-unsigned. I just tested this and it's working as expected.

The dbus-daemon corresponding change has been merged. Can we go ahead with this as well or there's more changes to do? The pending kernel patches would change the error codes but that handling is already merged in main, so I don't think this PR would change as a result

bluca commented 11 months ago

Regardless, I have not been able to compile a 6.5 kernel yet, so I want to wait merging the further patches before we can verify the behavior of SO_PEERPIDFD.

btw if you want to test this end-to-end Debian experimental has the 6.5 kernel now, so it can be installed via the package linux-image-6.5.0-0-amd64-unsigned. I just tested this and it's working as expected.

The dbus-daemon corresponding change has been merged. Can we go ahead with this as well or there's more changes to do? The pending kernel patches would change the error codes but that handling is already merged in main, so I don't think this PR would change as a result

@dvdhrm gentle ping. I've rebased again following your latest changes for the error code. Anything else you'd like me to change here? Thanks!

dvdhrm commented 11 months ago

Pushed now. I also pushed a commit to extract credentials into a structure and make it easier to pass around.

Lastly, note that I don't want to release this until the kernel release is done. I have to make sure the release builds work, even if the kernel interface changes before the release.

Thanks a lot for posting this!

bluca commented 11 months ago

Pushed now. I also pushed a commit to extract credentials into a structure and make it easier to pass around.

Lastly, note that I don't want to release this until the kernel release is done. I have to make sure the release builds work, even if the kernel interface changes before the release.

Yes I fully agree. Should be soon anyway.

Thanks a lot for posting this!

Thanks for the review and merge!

bluca commented 7 months ago

@dvdhrm do you think the current main is in a good shape for a pre-release git snapshot upload to debian/ubuntu? I'd like to have the pid fd stuff there in time for 24.04

dvdhrm commented 7 months ago

Pushed v34 out!

bluca commented 7 months ago

Ah fabulous, thank you very much

bus1 / dbus-broker

use PID FD if available from SO_PEERPIDFD, and return it via GetConnectionCredentials() #312