actions / runner-images

GitHub Actions runner images
MIT License
10.02k stars 3.03k forks source link

reverse lookup broken on Mac OS runners #8649

Closed oliver-sanders closed 1 month ago

oliver-sanders commented 11 months ago

Description

Reverse lookup of the host name is not working on the Mac OS runner.

ubuntu-latest:

$ nslookup fv-az955-853:
...
Name:   fv-az955-853.mlkcatuscfmejm4ctfapoghrmg.cx.internal.cloudapp.net

macos-latest:

$ nslookup $(hostname -f)
...
** server can't find Mac-1698147376508.local: NXDOMAIN

For an example, see the nslookup and python.socket steps of this workflow run:

https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243

First spotted a couple of weeks ago.

For context, see these two similar instances where reverse DNS stopped working on the Linux images:

Platforms affected

Runner images affected

Image version and build link

Image: macos-12 Version: 20230921.1

Image: macos-13 Version: 20231204.4

Is it regression?

Yes, seen with runners with macos version 12.7.1 or above.

Expected behavior

Reverse lookup should return the hostname.

Actual behavior

Reverse lookup results in error.

Repro steps

To reproduce, see this workflow:

https://github.com/oliver-sanders/actions-dns-test/actions/runs/6626432376/job/17999359243

shamil-mubarakshin commented 11 months ago

Hi @oliver-sanders, Thanks for reporting. We are investigating the issue

shamil-mubarakshin commented 11 months ago

@oliver-sanders, after poking around, nslookup doesn't seem to be the right tool for DNS lookups on macOS, which is also mentioned on tool's man page. It also leaves me wondering whether this behavior always been the case. Using dscacheutil gives more stable results, honouring local files (similar hack was with Ubuntu in the past, but the issue was in IP inconsistency). E.g. below should return host IPs:

echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts 
dscacheutil -q host -a name $(hostname -f)

We will continue investigating and see if something else could be done

oliver-sanders commented 11 months ago

@shamil-mubarakshin, thanks for looking in.

Didn't know there were issues with nslookup on Mac OS, interesting.

I also used Python's socket bindings in my tests which show similar failures for reverse lookups which had worked previously:

socket.gethostname()                              : Mac-1698147376508.local
socket.getfqdn()                                  : Mac-1698147376508.local
socket.getfqdn(socket.gethostname())              : Mac-1698147376508.local
socket.getfqdn(socket.getfqdn())                  : Mac-1698147376508.local
socket.gethostbyname_ex(socket.gethostname())[0]  : [Errno 8] nodename nor servname provided, or not known
socket.gethostbyname_ex(socket.getfqdn())[0]      : [Errno 8] nodename nor servname provided, or not known

I managed to dig out an example of a workflow where the Mac OS job failed the first two times and passed on the third: https://github.com/cylc/cylc-flow/actions/runs/6634707075

With this message in the failed runs:

socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1698197657674.local'
# attempt 1 - fail
  Image: macos-12
  Version: 20230921.1

# attempt 2 - fail
   Image: macos-12
  Version: 20231017.6

# attempt 3 - pass
   Image: macos-12
  Version: 20230921.4
oliver-sanders commented 10 months ago

Unfortunately the workaround isn't quite enough for my use case due to other interactions which require additional workarounds. We still occasionally get test runners where reverse lookup works.

MetRonnie commented 9 months ago

Getting some funky behaviour with Python 3.7 socket library (with @shamil-mubarakshin's above patch applied).

Runner: macOS 12.6.9:

>>> socket.gethostname()                           
'Mac-1702490668849.local'

>>> socket.gethostbyname_ex('Mac-1702490668849.local')                
('mac-1702490668849.local', [], ['192.168.64.23'])

>>> socket.getfqdn()                               
'Mac-1702490668849.local'

>>> socket.gethostbyname_ex('Mac-1702490668849.local')                
('Mac-1702490668849.local', ['Mac-1702490668849'], ['192.168.64.23'])

(This does not happen with macOS 12.7.1 runner (see #8642):)

>>> socket.gethostname()                           
'Mac-1702490723337.local'

>>> socket.gethostbyname_ex('Mac-1702490723337.local')                
('mac-1702490723337.local', [], ['10.213.1.225'])

>>> socket.getfqdn()                               
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa

>>> socket.gethostbyname_ex('Mac-1702490723337.local')                
('mac-1702490723337.local', [], ['10.213.1.225'])
oliver-sanders commented 9 months ago

Tried out macos 13 beta image and ran into the same issue (updated the OP).

Error from Python's socket interface:

socket.gaierror: [Errno 8] nodename nor servname provided, or not known: 'Mac-1702983423766.local'

Runner information:

Current runner version: '2.311.0'
Operating System
  macOS
  13.6.1
  22G313
Runner Image
  Image: macos-13
  Version: 20231204.4
  Included Software: https://github.com/actions/runner-images/blob/macos-13/20231204.4/images/macos/macos-13-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/macos-13%2F20231204.4

The macos 11 image is fine.

squakez commented 8 months ago

I've just tried to downgrade to macos-11 but apparently we're hitting the very same issue. I've run a simple test forking @oliver-sanders repo to verify the local DNS is working on any of the macos but it seems it's failing for all the available macos runners: https://github.com/squakez/actions-dns-test/actions/runs/7559304732/job/20582826961

oliver-sanders commented 8 months ago

@shamil-mubarakshin suggested (https://github.com/actions/runner-images/issues/8649#issuecomment-1779548056) that nslookup might not be the right tool for the job on Mac OS although I don't know the reasons why. Maybe worth testing via another interface.

The Python interfaces I rely on for my use case do work reliably on the macos-11 image but are broken on all newer images. My project is sticking with the macos-11 runners for now, but this old runner will be withdrawn in due course at which point we will have to drop macos as our usage is too complex to work around with the patch in https://github.com/actions/runner-images/issues/8649#issuecomment-1779548056.

It might be worth following this issue https://github.com/actions/runner-images/issues/7508 to see whether the issue is inherited by the new image.

squakez commented 8 months ago

Yeah, I've seen that. However in my case the problem is not the direct usage of nslookup. It is docker process that is using the local dns service to resolve a local name defined in /etc/hosts/. What it seems to me is that the local DNS service is completely off (I've checked the host has nothing running on port 53 as well), so, any resolution of local names is failing. I found a workaround by using localhost ip, but definetely, it is something that would require some attention as we'd expect a full functionality parities between the different runners. Let's see how if goes in future runners.

oliver-sanders commented 8 months ago

it seems to me is that the local DNS service is completely off

^ that!

oliver-sanders commented 4 months ago

The macOS 11 runner image will be removed by 6/28/24. To raise awareness of the upcoming removal, jobs using macOS 11 will temporarily fail during scheduled time periods defined below:

The workaround of falling back to macos 11 is about to expire, however the DNS of all new images remains problematic.

oliver-sanders commented 3 months ago

@shawnnapora, @shamil-mubarakshin (apologies for the poke)

The workaround of using macos 11 to avoid this DNS configuration bug is about to expire. Do you know if this issue is likely to be resolved in later macos images?

vieiro commented 2 months ago

Here's a reproducer of the problem in case it's of any help: https://github.com/vieiro/gha-macos-resolve-hostname

sarathrajsrinivasan commented 2 months ago

Hi @oliver-sanders ,

Please find the update below:

1.) Successful run for macOS12, macOS13 and macOS14 : https://github.com/sarathrajsrinivasan/macos-test/actions/runs/9949103379/job/27484814461

2.) Use below to update "/etc/hosts":

  for host in "$(hostname)" "$(hostname -f)"; do
      echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts 
      dscacheutil -q host -a name $(hostname -f)
  done

Updated "/etc/hosts" value:

  127.0.0.1      localhost
  255.255.255.255    broadcasthost
  ::1                localhost
  192.168.64.19      Mac-1721092163886.local     Mac-1721092163886
  192.168.64.19      Mac-1721092163886.local     Mac-1721092163886

3.) To get the IP address from the hostname:

  (a.) We can use dscacheutil to get the ip address of the host : 

       dscacheutil -q host -a name $(hostname -f)

       name      : mac-1721092163886.local
       ip_address: 192.168.64.19

  (b.) Use below powershell code:

      $hostName = [System.Net.Dns]::GetHostName()
      [System.Net.Dns]::GetHostEntry($hostName)

      HostName                  Aliases   AddressList
      --------                  -------   -----------
      mac-1721092163886.local   {}        {192.168.64.19, fe80::1424:f824:ec93:644d%7, f…

4.) After above fix, we were able to ping the host through the hostname:

 ping -c 4 Mac-1721092163886.local

    PING mac-1721092163886.local (192.168.64.19): 56 data bytes
    64 bytes from 192.168.64.19: icmp_seq=0 ttl=64 time=0.046 ms
    64 bytes from 192.168.64.19: icmp_seq=1 ttl=64 time=0.206 ms
    64 bytes from 192.168.64.19: icmp_seq=2 ttl=64 time=0.273 ms
    64 bytes from 192.168.64.19: icmp_seq=3 ttl=64 time=0.250 ms

5.) Reg. Python's socket bindings :

  Before fix:
  ocket.gethostname()                               : Mac-1721092163886.local
  socket.getfqdn()                                  : Mac-1721092163886.local
  socket.getfqdn(socket.gethostname())              : Mac-1721092163886.local
  socket.getfqdn(socket.getfqdn())                  : Mac-1721092163886.local
  socket.gethostbyname_ex(socket.gethostname())[0]  : [Errno 8] nodename nor servname provided, or not known
  socket.gethostbyname_ex(socket.getfqdn())[0]      : [Errno 8] nodename nor servname provided, or not known

  After fix:
  socket.gethostname()                              : Mac-1721092163886.local
  socket.getfqdn()                                  : Mac-1721092163886.local
  socket.getfqdn(socket.gethostname())              : Mac-1721092163886.local
  socket.getfqdn(socket.getfqdn())                  : Mac-1721092163886.local
  socket.gethostbyname_ex(socket.gethostname())[0]  : Mac-1721092163886.local
  socket.gethostbyname_ex(socket.getfqdn())[0]      : Mac-1721092163886.local

6.) Please check the above and let us know if it helps. We are working on adding the "/etc/hosts" change as part of the image. Will keep you posted.

MetRonnie commented 2 months ago

@sarathrajsrinivasan we are successfully using the patch

echo -e "$(ipconfig getifaddr en0) $(hostname -f) $(hostname -s)" | sudo tee -a /etc/hosts
dscacheutil -q host -a name $(hostname -f)

but ideally this would be fixed in the image

sarathrajsrinivasan commented 2 months ago

@MetRonnie Yes we are working on adding it as part of the image itself. Will update once the change is rolled out.

oliver-sanders commented 2 months ago

Thanks for the update.

sarathrajsrinivasan commented 1 month ago

Hi @oliver-sanders @MetRonnie,

We have added the above change to the "/etc/hosts" as part of the image itself. Please check. Closing the issue now. Please let us know incase of any questions.

MetRonnie commented 1 month ago

I have tested this and still got the DNS problems on

Runner Image Provisioner
  2.0.374.1+4097a9592d27ce71de414581a65bffbda888dd1b

But I ran again a few times and everything worked on

Runner Image Provisioner
  2.0.382.1+d27903c82fd0a98a6c4ff2ea9e193b4413f3d608

In both cases, the other runner version information was identical

Current runner version: '2.319.1'
Operating System
  macOS
  14.6.1
Runner Image
  Image: macos-14-arm64
  Version: 20240811.1
sarathrajsrinivasan commented 1 month ago

Hi @MetRonnie ,

Could you please check now. This should be resolved 👍🏼