Open marinnedea opened 3 years ago
I was able to reproduce the problem on all RedHat/CentOS by simply disabling IPv6 and removing the IPv4 DNS nameservers from /etc/resolv.conf:
Steps to reproduce: Append below lines in /etc/sysctl.conf:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
and then just run:
sudo sysctl -p
Remove IPv4 entries in /etc/resolv.conf (no need to backup, a simple "systemctl restart network
" will restore the file)
sudo echo "" > /etc/resolv.conf
At this point, try to run CustomScriptForLinux Extension.
The scenario is:
Barracuda image, which requires the IP to be set to static at appliance level and disable DHCP. In Azure Portal, just to avoid any issue, the same IP is configured as static, although, since DHCP is disabled at OS level, will influence in no way the OS side.
IPv6 completely disabled also.
Important note: Barracuda relies on a chrooted environment for waagent, which will prevent the waagent to get access to the /etc/resolv.conf file directly (this is already subject to change on Barracuda side)
The problem:
When running any extension that requires downloading a script (custom script extension, run-command invoke - if in the command we include any URL for any reason) and therefore a DNS resolution, will fail with the following error:
The problem is, for some reason, the WaLinuxAgent tries to download the file and the DNS resolver switches to IPv6, when there's no IPv6 enabled and the IPv4 resolv.conf file is missing/inaccessible (!?) See
on [::1]:53: dial udp [::1]:53: socket: address family not supported by protocol
part of the errors received. Normally, it should just trigger some error about unable to resolve the DNS, or that there's no DNS server configured.. or anything else a bit more meaningful.Found the following https://access.redhat.com/solutions/15863 (requires RedHat account to access it). Essentially, the above says:
Applications like ssh and telnet use the getaddrinfo() function with AF_UNSPEC and this function invokes both AAAA (ipv6) and A (ipv4) lookups one after the other. This can delay the connection time when DNS servers block or don't handle IPV6 correctly.
Most application that are part of Red Hat Enterprise Linux offer a configuration option to disable IPv6 (or IPv4 for that matter) completely. It is advisable that any third-party application provides similar solutions. getaddrinfo() can specify if IPv4, IPv6 or both should be used as explained in man getaddrinfo:
getaddrinfo will perform IPv4 and IPv6 lookups when using AF_UNSET.
The reason this is not disabled by default in RedHat is due to a conflict between the RFC and the requirement for IPv4-only lookups.
RedHat also provides a library that will unset LD_PRELOAD=libwgetaddrinfo.so, but with the mention this is an unsupported solution because of the RFC conflict mentioned above.
Considering the above, this should be implemented at WaLinuxAgent level or in the extensions downloading files, as per RedHat advise. Also, please take in consideration adding INFO/WARNING/ERROR messages in the extensions handlers logs if DNS fails on IPv4, and also lower time-outs.
Currently, if you unset LD_PRELOAD=libwgetaddrinfo.so, the WaLinuxAgent keeps trying to query the IPv4 DNS for 90 minutes, until the extension deployments times out, which is not OK.