Closed oliver-sanders closed 1 month ago
Hi @oliver-sanders, Thanks for reporting. We are investigating the issue
@oliver-sanders, after poking around, nslookup
doesn't seem to be the right tool for DNS lookups on macOS, which is also mentioned on tool's man page. It also leaves me wondering whether this behavior always been the case.
Using dscacheutil
gives more stable results, honouring local files (similar hack was with Ubuntu in the past, but the issue was in IP inconsistency). E.g. below should return host IPs:
echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts
dscacheutil -q host -a name $(hostname -f)
We will continue investigating and see if something else could be done
@shamil-mubarakshin, thanks for looking in.
Didn't know there were issues with nslookup on Mac OS, interesting.
I also used Python's socket
bindings in my tests which show similar failures for reverse lookups which had worked previously:
socket.gethostname() : Mac-1698147376508.local
socket.getfqdn() : Mac-1698147376508.local
socket.getfqdn(socket.gethostname()) : Mac-1698147376508.local
socket.getfqdn(socket.getfqdn()) : Mac-1698147376508.local
socket.gethostbyname_ex(socket.gethostname())[0] : [Errno 8] nodename nor servname provided, or not known
socket.gethostbyname_ex(socket.getfqdn())[0] : [Errno 8] nodename nor servname provided, or not known
I managed to dig out an example of a workflow where the Mac OS job failed the first two times and passed on the third: https://github.com/cylc/cylc-flow/actions/runs/6634707075
With this message in the failed runs:
socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1698197657674.local'
# attempt 1 - fail
Image: macos-12
Version: 20230921.1
# attempt 2 - fail
Image: macos-12
Version: 20231017.6
# attempt 3 - pass
Image: macos-12
Version: 20230921.4
Unfortunately the workaround isn't quite enough for my use case due to other interactions which require additional workarounds. We still occasionally get test runners where reverse lookup works.
Getting some funky behaviour with Python 3.7 socket
library (with @shamil-mubarakshin's above patch applied).
Runner: macOS 12.6.9:
>>> socket.gethostname()
'Mac-1702490668849.local'
>>> socket.gethostbyname_ex('Mac-1702490668849.local')
('mac-1702490668849.local', [], ['192.168.64.23'])
>>> socket.getfqdn()
'Mac-1702490668849.local'
>>> socket.gethostbyname_ex('Mac-1702490668849.local')
('Mac-1702490668849.local', ['Mac-1702490668849'], ['192.168.64.23'])
(This does not happen with macOS 12.7.1 runner (see #8642):)
>>> socket.gethostname()
'Mac-1702490723337.local'
>>> socket.gethostbyname_ex('Mac-1702490723337.local')
('mac-1702490723337.local', [], ['10.213.1.225'])
>>> socket.getfqdn()
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
>>> socket.gethostbyname_ex('Mac-1702490723337.local')
('mac-1702490723337.local', [], ['10.213.1.225'])
Tried out macos 13 beta image and ran into the same issue (updated the OP).
Error from Python's socket interface:
socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1702983423766.local'
Runner information:
Current runner version: '2.311.0'
Operating System
macOS
13.6.1
22G313
Runner Image
Image: macos-13
Version: 20231204.4
Included Software: https://github.com/actions/runner-images/blob/macos-13/20231204.4/images/macos/macos-13-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/macos-13%2F20231204.4
The macos 11 image is fine.
I've just tried to downgrade to macos-11
but apparently we're hitting the very same issue. I've run a simple test forking @oliver-sanders repo to verify the local DNS is working on any of the macos
but it seems it's failing for all the available macos runners: https://github.com/squakez/actions-dns-test/actions/runs/7559304732/job/20582826961
@shamil-mubarakshin suggested (https://github.com/actions/runner-images/issues/8649#issuecomment-1779548056) that nslookup
might not be the right tool for the job on Mac OS although I don't know the reasons why. Maybe worth testing via another interface.
The Python interfaces I rely on for my use case do work reliably on the macos-11 image but are broken on all newer images. My project is sticking with the macos-11 runners for now, but this old runner will be withdrawn in due course at which point we will have to drop macos as our usage is too complex to work around with the patch in https://github.com/actions/runner-images/issues/8649#issuecomment-1779548056.
It might be worth following this issue https://github.com/actions/runner-images/issues/7508 to see whether the issue is inherited by the new image.
Yeah, I've seen that. However in my case the problem is not the direct usage of nslookup
. It is docker process that is using the local dns service to resolve a local name defined in /etc/hosts/
. What it seems to me is that the local DNS service is completely off (I've checked the host has nothing running on port 53 as well), so, any resolution of local names is failing. I found a workaround by using localhost ip, but definetely, it is something that would require some attention as we'd expect a full functionality parities between the different runners. Let's see how if goes in future runners.
it seems to me is that the local DNS service is completely off
^ that!
The macOS 11 runner image will be removed by 6/28/24. To raise awareness of the upcoming removal, jobs using macOS 11 will temporarily fail during scheduled time periods defined below:
The workaround of falling back to macos 11 is about to expire, however the DNS of all new images remains problematic.
@shawnnapora, @shamil-mubarakshin (apologies for the poke)
The workaround of using macos 11 to avoid this DNS configuration bug is about to expire. Do you know if this issue is likely to be resolved in later macos images?
Here's a reproducer of the problem in case it's of any help: https://github.com/vieiro/gha-macos-resolve-hostname
Hi @oliver-sanders ,
Please find the update below:
1.) Successful run for macOS12, macOS13 and macOS14 : https://github.com/sarathrajsrinivasan/macos-test/actions/runs/9949103379/job/27484814461
2.) Use below to update "/etc/hosts":
for host in "$(hostname)" "$(hostname -f)"; do
echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts
dscacheutil -q host -a name $(hostname -f)
done
Updated "/etc/hosts" value:
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
192.168.64.19 Mac-1721092163886.local Mac-1721092163886
192.168.64.19 Mac-1721092163886.local Mac-1721092163886
3.) To get the IP address from the hostname:
(a.) We can use dscacheutil to get the ip address of the host :
dscacheutil -q host -a name $(hostname -f)
name : mac-1721092163886.local
ip_address: 192.168.64.19
(b.) Use below powershell code:
$hostName = [System.Net.Dns]::GetHostName()
[System.Net.Dns]::GetHostEntry($hostName)
HostName Aliases AddressList
-------- ------- -----------
mac-1721092163886.local {} {192.168.64.19, fe80::1424:f824:ec93:644d%7, f…
4.) After above fix, we were able to ping the host through the hostname:
ping -c 4 Mac-1721092163886.local
PING mac-1721092163886.local (192.168.64.19): 56 data bytes
64 bytes from 192.168.64.19: icmp_seq=0 ttl=64 time=0.046 ms
64 bytes from 192.168.64.19: icmp_seq=1 ttl=64 time=0.206 ms
64 bytes from 192.168.64.19: icmp_seq=2 ttl=64 time=0.273 ms
64 bytes from 192.168.64.19: icmp_seq=3 ttl=64 time=0.250 ms
5.) Reg. Python's socket bindings :
Before fix:
ocket.gethostname() : Mac-1721092163886.local
socket.getfqdn() : Mac-1721092163886.local
socket.getfqdn(socket.gethostname()) : Mac-1721092163886.local
socket.getfqdn(socket.getfqdn()) : Mac-1721092163886.local
socket.gethostbyname_ex(socket.gethostname())[0] : [Errno 8] nodename nor servname provided, or not known
socket.gethostbyname_ex(socket.getfqdn())[0] : [Errno 8] nodename nor servname provided, or not known
After fix:
socket.gethostname() : Mac-1721092163886.local
socket.getfqdn() : Mac-1721092163886.local
socket.getfqdn(socket.gethostname()) : Mac-1721092163886.local
socket.getfqdn(socket.getfqdn()) : Mac-1721092163886.local
socket.gethostbyname_ex(socket.gethostname())[0] : Mac-1721092163886.local
socket.gethostbyname_ex(socket.getfqdn())[0] : Mac-1721092163886.local
6.) Please check the above and let us know if it helps. We are working on adding the "/etc/hosts" change as part of the image. Will keep you posted.
@sarathrajsrinivasan we are successfully using the patch
echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts
dscacheutil -q host -a name $(hostname -f)
but ideally this would be fixed in the image
@MetRonnie Yes we are working on adding it as part of the image itself. Will update once the change is rolled out.
Thanks for the update.
Hi @oliver-sanders @MetRonnie,
We have added the above change to the "/etc/hosts" as part of the image itself. Please check. Closing the issue now. Please let us know incase of any questions.
I have tested this and still got the DNS problems on
Runner Image Provisioner
2.0.374.1+4097a9592d27ce71de414581a65bffbda888dd1b
But I ran again a few times and everything worked on
Runner Image Provisioner
2.0.382.1+d27903c82fd0a98a6c4ff2ea9e193b4413f3d608
In both cases, the other runner version information was identical
Current runner version: '2.319.1'
Operating System
macOS
14.6.1
Runner Image
Image: macos-14-arm64
Version: 20240811.1
Hi @MetRonnie ,
Could you please check now. This should be resolved 👍🏼
Description
Reverse lookup of the host name is not working on the Mac OS runner.
ubuntu-latest:
macos-latest:
For an example, see the
nslookup
andpython.socket
steps of this workflow run:https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243
First spotted a couple of weeks ago.
For context, see these two similar instances where reverse DNS stopped working on the Linux images:
Platforms affected
Runner images affected
Image version and build link
Image: macos-12 Version: 20230921.1
Image: macos-13 Version: 20231204.4
Is it regression?
Yes, seen with runners with macos version 12.7.1 or above.
Expected behavior
Reverse lookup should return the hostname.
Actual behavior
Reverse lookup results in error.
Repro steps
To reproduce, see this workflow:
https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243