duncs / clusterssh

Cluster SSH - Cluster Admin Via SSH
https://github.com/duncs/clusterssh/wiki
890 stars 79 forks source link

startup hangs for a looong time when dns servers cannot resolve the current hostname (not the target names) #158

Open v4hn opened 2 months ago

v4hn commented 2 months ago

I was wondering why my sessions sometimes take a veeeery long time to start. After some digging it turns out that there is a macro supported that fills in the current hostname and looks it up via Net::Domain::hostfqdn() (whether or not it's needed).

However, there are documented issues with this lookup in certain environments because it attempts to look up domain information through DNS request "attempts" instead of resolving the request locally and DNS servers might not reply at all with high timeouts. In my current work-environment this is what happens:

$ time perl -MNet::Domain -e 'print Net::Domain::hostfqdn(),"\n"'
this_machines_name

real    1m21.700s
user    0m0.021s
sys     0m0.015s

There is a Munin bug reported almost 20 years ago that describes the issue in detail. (I never thought I would refer to a bug documentation that old with current relevance, guess I grew old..)

My current workaround is to set macros_enabled=no in .clusterssh/config and thus avoid the logic.

I suggest to replace the hostname macro by Sys::Hostname::hostname(), as it seems unexpected that hostname should imply the fqdn anyway.

A less invasive alternative might be to at least check whether the macro is required before substitution and maybe add some debug output for it.