Cray-HPE / sat

System Admin Toolkit
https://cray-hpe.github.io/docs-sat/
MIT License
4 stars 5 forks source link

large scale systems have very slow ssh performance tied to reading /root/.ssh/known_hosts #123

Open dmjacobsen opened 1 year ago

dmjacobsen commented 1 year ago

during a recent full shutdown of a large scale systems we determined that it was taking up to 20 minutes to generate paramiko ssh connection objects. this time was brought close to zero by removing .ssh/known_hosts. It would seem that parsing that file can be very slow. Suggest simply never loading .ssh/known_hosts (which is not fully correct anyway, site keys should be in /etc/ssh/ssh_known_hosts), and then just using an AutoAddPolicy instead of the current WarnPolicy since it will be known that paramiko is unaware of the correct keys.

haasken-hpe commented 1 year ago

@dmjacobsen, can you share more information about what was in your .ssh/known_hosts file?

I'd like to try to reproduce this issue if possible, so we can try to validate a fix. I also heard mention that there may have been some impact due to certain names not being resolvable by DNS during the shutdown. Can you share any more information about that as well.