NixOS / nixops

NixOps is a tool for deploying to NixOS machines in a network or cloud.
https://nixos.org/nixops
GNU Lesser General Public License v3.0
1.77k stars 365 forks source link

nixops closes SSH connection when running `nixops check` #1571

Open datafoo opened 7 months ago

datafoo commented 7 months ago

When running nixops check, I sometime observe the Connection closed by authenticating user root log on the target machines. When this happens nixops check displays Up = No for the corresponding machines. But the real problem is that, if I retry multiple times, fail2ban bans my IP address because of Connection closed by authenticating user root:

Nov 14 09:15:12 mymachine sshd[1320]: Connection from 192.168.1.102 port 54986 on 192.168.1.50 port 22 rdomain ""
Nov 14 09:15:13 mymachine sshd[1320]: Accepted key ED25519 SHA256:******************************************* found at /etc/ssh/authorized_keys.d/root:1
Nov 14 09:15:13 mymachine sshd[1320]: Postponed publickey for root from 192.168.1.102 port 54986 ssh2 [preauth]
Nov 14 09:15:13 mymachine sshd[1320]: Accepted key ED25519 SHA256:******************************************* found at /etc/ssh/authorized_keys.d/root:1
Nov 14 09:15:13 mymachine sshd[1320]: Accepted publickey for root from 192.168.1.102 port 54986 ssh2: ED25519 SHA256:*******************************************
Nov 14 09:15:13 mymachine sshd[1320]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 14 09:15:13 mymachine sshd[1327]: Connection from 192.168.1.102 port 54992 on 192.168.1.50 port 22 rdomain ""
Nov 14 09:15:14 mymachine sshd[1327]: Connection closed by authenticating user root 192.168.1.102 port 54992 [preauth]
Nov 14 09:15:14 mymachine fail2ban.filter[989]: INFO [sshd] Found 192.168.1.102 - 2023-11-14 09:15:14

This was done running NixOps 2.0.0-pre-fc9b55c

Why is NixOps closing the SSH connection?

I have not been able to consistently reproduce the problem but I can say that it happens very often when the target machine is a Hetzner Cloud CX11 virtual machine.

datafoo commented 7 months ago

It is probably worth specifying the output of nixops check which shows some special:

[me@laptop:~/dev/mydeployments/deployment1.example.com]$ nixops check -d deployment1.example.com --include mymachine
Machines state:
+-----------+--------+----+-----------+----------+-----------+-------+-------+
| Name      | Exists | Up | Reachable | Disks OK | Load avg. | Units | Notes |
+-----------+--------+----+-----------+----------+-----------+-------+-------+
| mymachine | Yes    | No | N/A       | N/A      |           |       |       |
+-----------+--------+----+-----------+----------+-----------+-------+-------+
Non machines resources state:
+------+--------+
| Name | Exists |
+------+--------+
+------+--------+
mux_client_request_session: read from master failed: Broken pipe

[me@laptop:~/dev/mydeployments/deployment1.example.com]$ no such identity: /tmp/nix-shell.hgG5u9/nixops-tmp8cuxc17o/id_nixops-mymachine: No such file or directory
root@mymachine.deployment1.example.com: Permission denied (publickey).

The no such identity: /tmp/nix-shell.hgG5u9/nixops-tmp8cuxc17o/id_nixops-mymachine: No such file or directory root@mymachine.deployment1.example.com: Permission denied (publickey). part is not something I type myself, it just pops up in the console after 1 second or 2.