if there is more than one login attempt gnupg is not unlocked

treeshateorcs commented 4 years ago

e.g. if you enter your password once, and it was wrong, then you enter again and it is correct

this may have to do with systemd-homed or that i don't use any display managers

here's an excerpt from my journalctl

cruegge commented 4 years ago

I can't reproduce that currently, and also don't have an idea why that might happen. The thing ist, pam-gnupg sends the passphrase from pam_setcred or pam_open_session, both of which are only called after successful authentication. So failed attempts should in princple never reach the agent.

I don't know anything about systemd-homed, but your log looks as if you're using an encrypted home partition. Can you maybe outline your setup, and maybe also post the relevant files from /etc/pam.d, probably login and everything inlucded from there?

Also, the line

Nov 15 03:55:13 unix gpg-agent[16766]: Failed to lookup password for key n/8C07AF9F4560150577D103DE2B50D1389B3891AF with secret service: The name org.freedesktop.secrets was not provided by any .service files

is a bit unusual. Far as I know gpg-agent does not use org.freedesktop.secret. Do you have an idea where that might come from? Do you use gnome-keyring or kwallet?

Finally, in case you haven't already done that, can you try adding the store-only option to auth, i.e.

auth optional pam_gnupg.so store-only

That disables sending the passphrase from auth at all, deferring that to the session module. Maybe the issue is related to sending the passphrase too early (wrt. your encrypted home for example).

treeshateorcs commented 4 years ago

i use systemd-homed, and it encrypts the home folder using luks by default. i'm on archlinux, (i think it's the only major distro, as of now, that supports systemd-homed)

here's my system-local-login

#%PAM-1.0

auth      include   system-login
account   include   system-login
password  include   system-login
session   include   system-login
auth      optional  pam_gnupg.so store-only
session   optional  pam_gnupg.so

and swaylock (just in case)

#%PAM-1.0

#
# PAM configuration file for the swaylock screen locker. By default, it includes
# the 'login' configuration file (see /etc/pam.d/login)
#

auth include login
auth optional pam_gnupg.so

and login

#%PAM-1.0

auth       required     pam_securetty.so
auth       requisite    pam_nologin.so
auth       include      system-local-login
account    include      system-local-login
session    include      system-local-login

these files are the only ones i changed (according to your README)

this line is at the end (before the line that is responsible for running sway on tty1) of my .zprofile

export SSH_AUTH_SOCK=$(gpgconf --list-dirs agent-ssh-socket)

(but it doesn't work, though that's a different issue)

and my keygrip in ~/.pam-gnupg

that's pretty much all of my setup (unless i forget something)

i used gnome-keyring previously, and i set some things up so that it could unlock the gpg keyring on login, but i got rid of everything of that (i think, unless i forget something)

and no store-only didn't have any effect, it still asks for my password after at least one failed login attempt

cruegge commented 4 years ago

I pushed a commit that adds some basic debug logging, activated by a new module option debug. Can you set that for all instances of pam_gnupg.so, try again and post the relevant journal output of both login attempts?

treeshateorcs commented 4 years ago

i added debug to the pam_gnupg lines, so that system-local-login looks like this

#%PAM-1.0

auth      include   system-login
account   include   system-login
password  include   system-login
session   include   system-login
auth      optional  pam_gnupg.so debug
session   optional  pam_gnupg.so debug

and swaylock like this (even though i don't have problems with it)

auth include login
auth optional pam_gnupg.so debug

and here's the log

https://0x0.st/i5Vw.txt

i rebooted, and there's a longer log, with more occurences of pam_gnupg. maybe it will be more helpful

https://0x0.st/i5Vx.txt

cruegge commented 4 years ago

The logs look ok. The only log entries from the first attempt seem to be (from your second logfile)

Nov 18 01:56:11 unix systemd-homed[407]: tho: changing state inactive → activating-for-acquire
Nov 18 01:56:11 unix systemd-homework[1311]: None of the supplied plaintext passwords unlocks the user record's hashed passwords.
Nov 18 01:56:14 unix systemd-homed[407]: Activation failed: Required key not available
Nov 18 01:56:14 unix systemd-homed[407]: tho: changing state activating-for-acquire → inactive
Nov 18 01:56:14 unix systemd-homed[407]: Got notification that all sessions of user tho ended, deactivating automatically.
Nov 18 01:56:14 unix systemd-homed[407]: Home tho already deactivated, no automatic deactivation needed.

Apparently, pam-gnupg is not even executed in this case, so I don't understand how this failed attempt can influence the outcome of the second login.

On the second login, there's an error message from setcred,

Nov 18 01:56:34 unix login[664]: pam_gnupg(login:setcred): presetting failed, retaining passphrase

but that is fine since the agent is not yet running after reboot (it's not autostarted during setcred). The key is preset later when opening the session,

Nov 18 01:56:34 unix login[664]: pam_gnupg(login:session): presetting passphrase succeeded, cleaning up

followed by another setcred that fails since the passphrase has already been cleaned up,

Nov 18 01:56:34 unix login[664]: pam_gnupg(login:setcred): unable to obtain stored passphrase

so again this is fine. You can also see the agent being socket-activated by systemd right before pam-gnupg logs the success message.

Nov 18 01:56:34 unix gpg-agent[1356]: gpg-agent (GnuPG) 2.2.23 starting in supervised mode.

What's slightly strange is that the line

Nov 18 01:56:34 unix systemd[1341]: Listening on GnuPG cryptographic agent and passphrase cache.

saying that systemd listens on the agent socket actually comes after the agent has already started and pam-gnupg is done, but maybe that is just due to log messages arriving out of order.

More importantly, there are messages from gpg-agent[1356] and gpg-agent[1392], the latter being the one that tells your spotifyd that it can't get the password. So at least briefly you seem to have multiple agents running. Do you maybe set GNUPGHOME to something other than ~/.gnupg, and if so, where do you set it?

To exclude that there are any kind of races between pam-gnupg and systemd autostarting the agent, can you maybe try adding no-autostart to the session optional pam_gnupg.so line? (This will only work if you did not set GNUPGHOME, otherwise systemd will not listen on the correct socket.)

I don't expect that to be the reason, though, because that issue would not just occur after a failed login, but I also don't have any better idea currently.

Finally, there's and error message saying

Nov 18 01:56:35 unix gpg-agent[1356]: failed to unprotect the secret key: Required environment variable not set

I don't think this is really related to this issue, but you may need to add

gpg-connect-agent --no-autostart updatestartuptty /bye

to you xinitrc for the pinentry to work.

treeshateorcs commented 4 years ago

thank you for the elaborate answer. i appended no-autostart to one line in system-local-login and also added

exec gpg-connect-agent --no-autostart updatestartuptty /bye

to my sway config

here's a log https://0x0.st/i505.txt

and no, i don't have GNUPGHOME set. my gnupg home is the standard location (~/.gnupg)

treeshateorcs commented 4 years ago

also, i'm not sure this is related. but i actually have a slight problem with my login process. it is too slow. i have to wait for about 10 seconds before i'm logged in, i tried to debug it, and what i found out was the high cpu usage of the systemd-homework process during a login. this is hardly helpful, but i felt like telling you

cruegge commented 4 years ago

Well, the environment variable not set error did not go away by updatestartuptty, but like I said, it's probably unrelated.

I just noticed that your log, and actually also the previous one, shows 3 different agent instances: the one started by socket activation (1291), one started presumably by spotifyd (1329), and one for which I can't tell the cause from the log (1589). Do you have an idea where that might come from? Do you use pass somewhere else during session startup?

Maybe spotifyd, or more precisely the pass command that it spawns, can't talk to the agent and then autostarts one of its own that is then killed with the service's entire cgroup after spotifyd fails. In fact, in the first log you posted, the one where you did not reboot, only shows one additional agent (unclear whether it's the systemd service or the “third one”), and spotifyd seems to be fine there, since in that case the agent was already running. I think you may need to add a dependency After=gpg-agent.socket to your spotifyd.service.

On the other hand, that would mean that also this thing is not necessarily related to the issue. Moreover, if spotifyd actually did work in that case, that means that at least at some point some agent was able to provide the password. Or maybe spotifyd itself was already running and simply did not fail due to that. (There are just too many scenarios :-/ )

Could you try the following?

Run fuser $(gpgconf --list-dirs agent-socket) and compare the PIDs to the log file to figure out which instance is the one pass is talking to, and maybe also check if there are multiple agents running.
For comparison, post the log and check fuser and agent instances after a reboot when you log in on the first attempt. Maybe the difference can tell us what's wrong.

If there actually are multiple instances, maybe try moving the gpg-agent binary away and replace it by a script like

#!/bin/sh
date >> ~/agent.log
echo "PID: $$" >> ~/agent.log
echo "Parent: $PPID" >> ~/agent.log
ls -l /proc/$PPID/exe >> ~/agent.log
pstree -pT >> ~/agent.log
exec /path/to/real/gpg-agent "$@"

Maybe that can tell us how the instances are started.

treeshateorcs commented 4 years ago

i will do the rest in a minute

this is before reboot. (login on first try)

treeshateorcs commented 4 years ago

okay this after reboot (wrong password on first try

and agent.log https://0x0.st/i55x.txt

cruegge commented 4 years ago

I still don't understand this. There's only one agent instance, it's listening on the socket, and it's pretty clearly the one that was started by systemd after being triggered by pam-gnupg (you can see the pam_gnupg_helper in the pstree output).

Maybe let's check what messages the agent is actually receiving. Can you edit the gpg-agent unit (systemctl --user edit gpg-agent.service) with the following override:

[Service]
ExecStart=
ExecStart=/usr/bin/gpg-agent --supervised --debug-level basic
StandardOutput=append:%h/agent.journal

This will write all agent traffic to ~/agent.journal.

This file will contain your hex-encoded password and partial plaintexts of encrypted secrets, so do not post it without editing!

I did that on my notebook and edited out all potentially secret parts, ([..keygrip..], [..passphrase..], etc): https://0x0.st/i57P.journal

All I did was restart the service, log in (once) on another VT, and then decrypt something by invoking pass (that's the PKDECRYPT block at the bottom, where I edited out ciphertext and plaintext). The interesting part is the preset_passphrase line, that's coming from pam-gnupg. The last argument in that line is your hex-encoded password. You can decode that with

echo -e "$(sed 's/../\\x\0/g')"

then paste the encoded password and hit Ctrl-D.

Can you check that the preset command arrives, contains the correct password, try decrypting something to see what turns up in the log, and maybe post an edited version of the log if you're comfortable with that?

Finally, a small remark that I never noticed until now: using pass from systemd services like spotifyd has an inherent racing condition even when you declare After=gpg-agent.socket. The reason is that pam_gnupg.so runs (and must run) after pam_systemd.so, which does not know about pam-gnupg and will therefore start spotifyd as soon as the socket is up, but potentially before the passphrase has been preset (you actually see spotifyd already running in you agent.log). There is no simple, clean way to solve that; the only option I can think of is to introduce another service that can be placed between the socket and spotifyd, waits for some appropriate signal from pam-gnupg and only then signals successful startup to systemd. Maybe I'll implement something like that if I can come up with a sufficiently simple way to do that, but for the time being, you should probably add something like

ExecStartPre=/bin/sleep 5

to spotifyd.service.

treeshateorcs commented 4 years ago

the decoded password is only partly correct. it's only the first 12 symbols of my password

https://0x0.st/i5hm.txt this is edited agent.journal if you are interested

treeshateorcs commented 4 years ago

when i successfully login after the first try the decoded passphrase is correct

treeshateorcs commented 4 years ago

i just tried a "failed" login again and i got 15 symbols of my passphrase this time

sorry for the spam, i thought this was important

cruegge commented 4 years ago

Well, I still don't understand, but at least we're getting somewhere. I added some debug print statements in the debug branch. Can you build that and try again (./configure && make && sudo make install after uninstalling the AUR package)? It should print messages directly to your console, detailing how many bytes of what tokens where received and sent by which part of the code. It will print your password in plain text. Can you tell from that at which point the password gets truncated?

treeshateorcs commented 4 years ago

okay so i think i've figured it out

do you see it?? :D

cruegge commented 4 years ago

Ohhhh, the “Sorry, try again”-Prompt is not from login, but from systemd-homed which fails with your initial password, then asks again by itself and then does not hand the passphrase down to us. We're still using the first password.

I also found it in the systemd code: https://github.com/systemd/systemd/blob/07ccf434e77b17b20d773ce8b2637083cd4fdafc/src/home/pam_systemd_home.c#L310

There's no place in the systemd codebase where the pam auth token is set, so I think it will be impossible to obtain it this way. It does set some auxiliary pam data items (systemd-home-user-record-$USER and systemd-user-record-$USER), but these JSON User Records only contain the hashed passphrase.

I'm afraid it may be impossible to solve this issue. I'll have to look how gnome-keyring handles this, they probably encounter the same problem, and definitely know better than me how homed works.

Anyway, at least I'm glad we figured this out. It was starting to drive me insane.

cruegge commented 4 years ago

Well, gnome-keyring doesn't do any better apparently: https://gitlab.gnome.org/GNOME/gnome-keyring/-/issues/59#note_914341

Edit: systemd also has an issue that seems to be related: https://github.com/systemd/systemd/issues/16814

treeshateorcs commented 4 years ago

thank you for your work!

cruegge commented 3 years ago

Systemd 247, which just reached the arch repo, contains a fix. Can you test if it works for you?

treeshateorcs commented 3 years ago

yes, i just rebooted, this problem has been fixed with the release of systemd 247

cruegge / pam-gnupg

if there is more than one login attempt gnupg is not unlocked #24