authzed / zed

Official command-line tool for managing SpiceDB
https://authzed.com/docs/reference/clients
Apache License 2.0
117 stars 25 forks source link

'zed context' appears to deadlock when run over a remote terminal if `DISPLAY=` is set (e.g. a remote workstation), due to keyring integration #140

Open thoughtpolice opened 2 years ago

thoughtpolice commented 2 years ago

First off, SpiceDB is very cool and seemingly the best Zanzibar implementation there is. Thanks for that! It's a fun product that I'm still wrapping my head around. Now, a quick story.


Last night while trying to debug a kubernetes deployment, I realized zed was hanging on any usage of the context command to manage credentials while trying to interface with the service. It was completely bizarre, and looked like this (after a kubectl proxy):

$ zed --no-verify-ca --insecure --log-level trace context set local grpc://127.0.0.1:50051 <xyz>
9:03AM DBG set log level new level=trace

In fact it was so bizarre because I had used zed previously, on another machine, to test a docker container. And I did that again to be sure I wasn't losing it, and it was just fine. It prompted me for a password for the JWT keys; I remembered that. So what could be the problem? I tried strace and nothing immediately popped out; it was waiting on a futex, which almost always means waiting on some IPC or lock mechanism; unfortunately a futex is too low level to work off of. I read the manual and saw that it integrates with the system keyring, but just writes JWT files according to the go keyring package. I confirmed it wrote those JWT files on my working machine. The source code (when I read it) seemed innocuous enough. But I couldn't figure it out.

So I went to sleep, and this morning while getting coffee it dawned on me: I thought of the words "GNOME Keyring", and realized the system where the hang occurs has a full GUI (desktop environment) active. More precisely, it is an Ubuntu Virtual Machine running a GNOME desktop (I need GNOME for some proprietary software to run its user interface on this machine). But I develop on this machine by using vscode with its powerful SSH integration to help desktop GUI latency/my Mac has better font rendering. On the other hand, the system where zed worked was completely headless and had no UI active at all. GNOME Keyring will prompt you for encryption passwords via the UX if, and only if, the GUI is active.

Sure enough, this morning I sat down, sat at VSCode on my Mac, typed in the words zed context set local ... and the hang started. I moved over and opened my Ubuntu Virtual Machine, and sure enough, the culprit causing the hang reveals itself:

image


So I honestly don't know what to do here because the actual thing is, everything is "Working as expected." So the problem is:

1) If you have a desktop active, BUT 2) You are using a remote terminal without the UI available, 3) zed context, or more precisely any usage of keyring, appears to hang

But this is all by design. It's kind of expected that if the desktop was open, you'd be using it. However it's extremely non-intuitive is this happens when you aren't using the desktop, because there is literally no possible way to tell if anything is happening.

Maybe it would be possible to just add more tracing messages, honestly? The fact I couldn't even tell what zed context was trying to do until I refreshed my memory (by using zed context successfully on the other machine) made it hard to figure out what might be happening.

There is also the more broad question of authenticating with SpiceDB in the long run. While I understand SpiceDB occupies a special infrastructural role — it being used to build your auth infrastructure means "self bootstrapping" user authorization is tricky — I suspect a simple pre-shared bearer token won't work for everyone, forever; but it is simple. So in the future assuming basic authentication came from some other mechanism the keyring might instead be avoided entirely, making this moot. I don't know (e.g. the simplest case would be to configure a CA and only allow clients with a signed proper TLS cert to connect; or ed25519, etc...)

One case I haven't checked is that my Desktop UI was actually logged into GNOME; it might be the case that if your user doesn't have a desktop login session available, then the keyring will be prompted remotely over the terminal instead (because there is no working display server for the user.) I'll test this just to be sure...

thoughtpolice commented 2 years ago

And actually, there's another question that might be even worth asking more prominently: is it worth integrating with the users' keyring via the keyring package at all to store these keys? Generally most tools just trust the token is available and the ambient system is secure. Would it just be worth trusting that the preshared key is kept safely, instead?

Basically: I find this feature to both be kind of nice (don't get me wrong, I love Keychain on my Mac, which I find to be the best implementation of this idea), but also this interaction to be a bit surprising. It's not deal breaking and I'm not petitioning to remove usage of keyring, mind you. Just something to think about.

Really this bug report is just in case anyone else runs into the same problem, even if it's incredibly, weirdly specific.