democratic-csi / democratic-csi

csi storage for container orchestration systems
MIT License
849 stars 77 forks source link

PSA/FYI: TrueNASCORE 13 no longer supports RSA for SSH, results in `Error: All configured authentication methods failed` error emitted in controller container logs #265

Closed kquinsland closed 1 year ago

kquinsland commented 1 year ago

A while back, I noticed that I had two pods that were in a failed state. No workloads were effected so I brushed it off until I had more time to sit down and investigate.

I tracked down the two pods to the csi that I use to manage PV on my TrueNAS core.

❯ kubectl get po -n democratic-csi
NAME                                                   READY   STATUS             RESTARTS            AGE
zfs-nfs-democratic-csi-node-5rxk6                      4/4     Running            51 (21d ago)        106d
zfs-iscsi-democratic-csi-node-tt25x                    4/4     Running            51 (21d ago)        106d
zfs-iscsi-democratic-csi-node-5d6ld                    4/4     Running            55 (21d ago)        106d
zfs-nfs-democratic-csi-node-6dwsq                      4/4     Running            55 (21d ago)        106d
zfs-nfs-democratic-csi-node-gxcxn                      4/4     Running            51 (21d ago)        106d
zfs-iscsi-democratic-csi-node-vl8tl                    4/4     Running            51 (21d ago)        106d
zfs-nfs-democratic-csi-controller-5f74955896-kvmfc     1/5     CrashLoopBackOff   25145 (3m43s ago)   21d
zfs-iscsi-democratic-csi-controller-58668f7c79-vlm8g   1/5     CrashLoopBackOff   25104 (39s ago)     21d

I'll spare you the details / dead-ends from my notes, but the solution was to re-enable support for RSA in the sshd_config file.

From the TrueNAS/CORE web UI, Services > SSH > Advanced > Auxiliary Parameters

Add the line:

PubkeyAcceptedAlgorithms=+ssh-rsa

Click save and the ssh server will restart. Either wait for the CrashLoopBackOff to re-spin the container or kill the pods. After that, the controller pods came back up.

It was only after that I found the solution that I did some checking through my notes and it looks like the pods have been in a failing state since I did the 12 -> 13 upgrade on my NAS. Since 12 is EOL, I suspect that more people will get hit by this if they have not already.

And while drafting this post, I found a note about this exact issue (just with different symptoms) in the Known Issues for the upgrade:

TrueNAS 12 cannot replicate to or from TrueNAS 13   By default, TrueNAS 12 cannot initiate a replication to or from TrueNAS 13 due to an outdated SSH client library. Allowing replication to or from TrueNAS 13 to TrueNAS 12 requires allowing ssh.rsa algorithms. See [OpenSSH 8.2 Release](https://www.openssh.com/txt/release-8.2) for security considerations. Log into the TrueNAS 13 system and go to Services->SSH. Add the SSH Auxiliary Parameter: PubkeyAcceptedAlgorithms +ssh-rsa.

This ends the PSA


I'm not a JS expert, but after a quick skim of the docs and this code

it looks like I should be able to add a

sshConnection:
  algorithms:
    serverHostKey:  ...

to the yaml file I use to render out the heml chart?

Or, alternatively should I create a new curve25519 based SSH key for the root user and update the rendered chart with the new key? The example configurations all use -----BEGIN RSA PRIVATE KEY----- which is why I went with RSA keys to begin with.

travisghansen commented 1 year ago

There's a note about it here:

In TrueNAS for the custom sshd params you can add:

PubkeyAcceptedAlgorithms +ssh-rsa

There are a few issues about it here but it could use a bit more exposure in the docs etc.

kquinsland commented 1 year ago

There are a few issues about it here but it could use a bit more exposure in the docs etc.

Yeah, now that I know what the issue is, I see a few notes/warnings about it.

At the onset, all I had was that the workloads were fine but the controllers were not. I didn't add any new workloads that needed PVC so the controller pods failure was not evident and I didn't put 2 and 2 together / correlate the failure with the timing of the 12 -> 13 update.

I'm assuming that RSA keys are still required then?

travisghansen commented 1 year ago

No, you are welcome to use any key style that ssh supports. For example I use ssh-ed25519 keys without issue.

kquinsland commented 1 year ago

Thanks for confirming RSA isn't required.