frazer-lab / cluster

Repo for cluster issues.
1 stars 0 forks source link

fl1 problem #187

Closed hirokomatsui closed 7 years ago

hirokomatsui commented 7 years ago

Hi Paul, it's not in the new cluster, but can you look at the server? You can log on as root user, the password is the same. It looks like having attacks for a while. The log file says the drive is full, but "df" command cannot return. The server is a NIS client of flc, and mounting same drive as flc does, raid, raid2, raid3 and frazer-share as nas. We don't do much on the server, but there's lab's HP running on there which we cannot put down.

tatarsky commented 7 years ago

Sure. Give me a moment to get to it.

hirokomatsui commented 7 years ago

Thanks, please let me know if you need any more information.

tatarsky commented 7 years ago

df is hanging on something but I will find what. /var is full. Can I clear a little space?

hirokomatsui commented 7 years ago

Sure, we don't put any data there except system files.

tatarsky commented 7 years ago

Looks like logs stopped rotating about a year ago. taking a look to see why.

tatarsky commented 7 years ago

It is hanging on the mount /nas from frazer-share. Is that machine OK?

/nas1 from the same machine however still works.

hirokomatsui commented 7 years ago

Yea, frazer-share's working fine.

hirokomatsui commented 7 years ago

The mount setting for nas and nas1 are the same as flc does, which is working fine.

tatarsky commented 7 years ago

OK. Will try a force umount. Could result in a panic/reboot....is that OK?

tatarsky commented 7 years ago

Yes BTW, this system is being ssh brute forced. We could limit its ssh access a bit to campus IP addresses if you want. I've made some /var space.

tatarsky commented 7 years ago

rotating logs now.

hirokomatsui commented 7 years ago

Yes, I see the brute all the time as well as other servers. Especially fl1's getting too much maybe known by someone. We can limit the ssh access only from the campus.

tatarsky commented 7 years ago

OK. And to confirm, if this force umount causes a kernel panic thats not a big deal. It can happen.

tatarsky commented 7 years ago

The problem is I believe from a past situation when the frazer-share system changed its IP.

frazer-share.ucsd.edu:/shares/Public on /nas type nfs (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.163.54,mountvers=3,mountport=892,mountproto=udp,local_lock=none,addr=172.16.163.54)

I believe that 172.16.163.54 address is old. So its hanging because its got a mount to a system that is no longer there.

hirokomatsui commented 7 years ago

The correct one is 172.16.200.40

tatarsky commented 7 years ago

Yep. I've done a lazy umount....df is back. Will remount shortly.

tatarsky commented 7 years ago

OK. Please check /nas.

tatarsky commented 7 years ago

A lazy umount BTW is: umount -l /mountpoint. Thats an "l" and it is useful for this situation.

hirokomatsui commented 7 years ago

Looks good!

tatarsky commented 7 years ago

OK. So let me poke around a notch at ssh and iptables and I'll recommend some bruting reduction after I look at what is on this box.

tatarsky commented 7 years ago

But that gets you back in business. I'll make sure logs rotate as well.

hirokomatsui commented 7 years ago

OK, thanks.

tatarsky commented 7 years ago

No prb.

tatarsky commented 7 years ago

The other generic /var issue is mysql is doing binary logging and there hasn't been a purge for awhile.

  1. Is this an active mysql server doing perhaps replication? (Which means those bin logs are needed)
  2. If no to 1 we can limit or clear those bin logs.

I show a Bacula database on this system's mysql.

hirokomatsui commented 7 years ago

The mysql server is active pretty much only for bacula, which takes care for our ancient tape drive.

tatarsky commented 7 years ago

Gotcha. I got /var down to 64% full with log rotation. I'll review the purge command for those bin logs or we can shutdown/disable them later.

tatarsky commented 7 years ago

I will likely block with /etc/hosts.allow ssh access from off campus if you would like today. Using the same method as on fl-hn1. You can adjust/edit that if we block somebody we like.

tatarsky commented 7 years ago

Noting clearly BTW that UNLIKE fl-hn1 we end the list of allows with a DENY. We've enabled the dynamic blocked "denyhosts" on fl-hn1 to end bruting. I could do the same on fl1 but I suspect this is good enough. I show logs rotating fine now BTW. I think this is good to go but confirm you can still login ;)

tatarsky commented 7 years ago

OpenSUSE turns out to not support tcpwrappers with their SSH. For "reasons".

So it would be an iptables mod. I'll look at that in a moment.

tatarsky commented 7 years ago

This BTW I will get back to as the iptables on the unit was a bit baroque and I'm not overly Suse method familiar.

tatarsky commented 7 years ago

BTW I just ssh'd to flc by mistake and its /var is also full. Do you want that reviewed?

hirokomatsui commented 7 years ago

Sure, thanks

hirokomatsui commented 7 years ago

Did you install a script to set hosts.deny to deny the attackers on flh1 and flh2? If it works for fl1 and flc and easier than setting iptables, we can install that instead.

tatarsky commented 7 years ago

That won't work because Suse compiles ssh without tcpwrappers support.

hirokomatsui commented 7 years ago

Oh, I see.

tatarsky commented 7 years ago

CentOS/RHEL still support tcpwrappers in their SSHD. So the easier hosts.allow/deny method can and is done already there.

tatarsky commented 7 years ago

Suse uses the config defined in the SuSEfirewall2 (SFW2) area. Which I don't deal with often. So when I have a moment to plow through its ways to add some control over SSH (22/tcp) I will do so to UCSD subnets.

Light reading to docs ;)

https://www.suse.com/documentation/sles-12/book_security/data/sec_security_firewall_suse.html

tatarsky commented 7 years ago

So I had a moment to come back to this @hirokomatsui and I think the gist is that you can set more precise external port allows using these variables in the /etc/sysconfig/SuSEfirewall2 file.

FW_SERVICES_ACCEPT_EXT

Possibly another way involves:

FW_TRUSTED_NETS

As discussed in: https://lists.opensuse.org/opensuse-security/2002-01/msg00345.html

I have a few guesses what the syntax might be. But if this box is in production its probably not overly wise to just test some ideas there. If you had an OpenSuse box that wasn't as important I might try some ideas there. Mostly I don't want to lock myself out from ssh ;)

Basically brute forcing continues as it does against any exposed ssh server. So if there are weak passwords on this machine might be wise to shore them up.

hirokomatsui commented 7 years ago

I'll leave it for now. I think there's very few chance that they can break the passwords.

tatarsky commented 7 years ago

Sounds good. Closing for now! If I happen to figure it out somewhere I'll let you know!