cloyne / servers

Salt configuration for Cloyne servers.
0 stars 3 forks source link

server2 requires manual config after power cycle #9

Open ecawthon opened 2 years ago

ecawthon commented 2 years ago

server2 was responsive to ping but unresponsive to SSH or HTTP. We went to Fremont to manually power cycle it. This fixed ssh, sympa, phpmyadmin, and phppgadmin, but not postfix or blog.

using salt-ssh 'server2' state.highstate was not showing output until forced to quit with control-c, which, if allowed to run long enough, did show the normal output. After fixing the routes, this ran normally and showed output on its own.

Mailserver

Nullmailer was showing temporary error in name resolution for connecting to mail.cloyne.org. This suggested either a DNS issue or an issue with postfix. First, I tried to test mail.cloyne.org using telnet. This worked from my laptop but failed from server2, so that meant it wasn't an issue with mail.cloyne.org. Next, I checked /etc/resolv.conf to find the IP of our DNS server and then tried pinging it. Ping failed.

I re-added the routes we added when we first moved the servers:

ip route add 64.62.133.41 dev [eth0 for server2, p1p1 for server3]
ip route replace default via 64.62.133.41 dev [eth0 for server2, p1p1 for server3] src 64.62.133.[44 for server2, 45 for server3]

This fixed the mailserver but not the blog.

Blog

After fixing the route, I ran salt-ssh server2 state.highstate again with just mysql and its dependents and this fixed it. Probably because it upgraded the tozd images which had been upgraded? Not sure what was wrong.

ecawthon commented 2 years ago

Update: this resulted in cloyne.org still being inaccessible from inside the house. I fixed this by adding

ip route change 64.62.133.40/29 dev eth0 via 64.62.133.41 src 64.62.133.44

this changed it from the following incorrect line that had been in ip route previously:

64.62.133.40/29 dev eth0 proto kernel scope link src 64.62.133.44

The reason this didn't work is that it didn't know how to actually get to the /29, whereas now it knows it's supposed to get there via 64.62.133.41 (HE router)

The equivalent line for server3 would be

ip route change 64.62.133.40/29 dev p1p1 via 64.62.133.41 src 64.62.133.45