alajmo / sake

:robot: sake is a task runner for local and remote hosts
https://sakecli.com
MIT License
648 stars 24 forks source link

Can't access libvirt hosts via dns name #43

Open rktjmp opened 1 year ago

rktjmp commented 1 year ago

Info

Problem / Steps to reproduce

I have some VMs running under libvirt, which are normally accessible via their host name, but sake can't resolve them. Resolution does work in other tools (ping, pyinfra, ansible, ssh, etc) so I don't think it's a general configuration error.

servers:
  localhost:
    host: 0.0.0.0
    local: true
  vm:
    host: host1vm

tasks:
  ping:
    desc: Pong
    cmd: echo "pong"
λ sake run ping --all

 Unreachable Hosts

 server | host    | user | port | error
--------+---------+------+------+---------------------------------------------------------------
 vm     | host1vm | soup | 22   | dial tcp: lookup host1vm on 127.0.0.53:53: server misbehaving

λ ping host1vm
PING host1vm (192.168.122.244) 56(84) bytes of data.
64 bytes from 192.168.122.244 (192.168.122.244): icmp_seq=1 ttl=64 time=0.141 ms
^C64 bytes from 192.168.122.244: icmp_seq=2 ttl=64 time=0.132 ms

You can use https://github.com/rktjmp/virt-up to bring up named hosts, but it does require some setup.

rktjmp commented 1 year ago

Raw Go test does work,

package main

import (
        "net"
        "fmt"
        "os"
)

func main() {
        ips, err := net.LookupIP("host1vm")
        if err != nil {
                fmt.Fprintf(os.Stderr, "Could not get IPs: %v\n", err)
                os.Exit(1)
        }
        for _, ip := range ips {
                fmt.Printf("host1vm. IN A %s\n", ip.String())
        }
}
λ go run main.go
host1vm. IN A 192.168.122.244
alajmo commented 1 year ago

Great find, I need to implement the LookupIP method you provided. Just to be safe, does it work as intended when you paste the IP directly?

rktjmp commented 1 year ago

Direct IP works.

servers:
  ip:
    host: 192.168.122.244
λ sake run ping --servers ip

TASK [ping: Pong] ******************************************************************************

192.168.122.244 | pong
alajmo commented 1 year ago

What's your local resolver? It works for me without any changes, I have my nameserver set to my pi-hole:

/etc/resolv.conf

local resolver
domain lan
search lan
nameserver 192.168.1.209

Seems sake is resolving to 127.0.0.53:53 in your case, I just want to make sure I can replicate your environment and fix it correctly.

rktjmp commented 1 year ago
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search .
λ resolvectl
Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: stub
Fallback DNS Servers: 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net 8.8.8.8#dns.google
                      2606:4700:4700::1111#cloudflare-dns.com 2620:fe::9#dns.quad9.net
                      2001:4860:4860::8888#dns.google

Link 2 (eno1)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.20.1
       DNS Servers: 192.168.20.1

Link 4 (docker0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 5 (virbr0)
Current Scopes: LLMNR/IPv4
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 45 (br-77f2899f0eb4)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 71 (tap0)
Current Scopes: LLMNR/IPv6
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 72 (tap1)
Current Scopes: LLMNR/IPv6
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 73 (tap2)
Current Scopes: LLMNR/IPv6
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

You might also have to setup libvirt-nss https://libvirt.org/nss.html

rktjmp commented 1 year ago

FWIW, adjusting the make file as so:

@@ -53,9 +53,9 @@ mock-performance-ssh:
    cd ./test && docker-compose -f docker-compose-performance.yaml up

 build:
-   CGO_ENABLED=0 go build \
+   CGO_ENABLED=1 go build \
    -ldflags "-s -w -X '${PACKAGE}/cmd.version=${VERSION}' -X '${PACKAGE}/cmd.commit=${GIT}' -X '${PACKAGE}/cmd.date=${DATE}'" \
-   -a -tags netgo -o dist/${NAME} main.go
+   -a -tags netcgo -o dist/${NAME} main.go

 build-all:
    goreleaser release --skip-publish --rm-dist --snapshot
@@ -63,7 +63,7 @@ build-all:
 build-and-link:
    go build \
        -ldflags "-w -X '${PACKAGE}/cmd.version=${VERSION}' -X '${PACKAGE}/cmd.commit=${GIT}' -X '${PACKAGE}/cmd.date=${DATE}'" \
-       -a -tags netgo -o dist/${NAME} main.go
+       -a -tags netcgo -o dist/${NAME} main.go
    cp ./dist/sake ~/.local/bin/sake

 gen-man:

Then running

λ GODEBUG=netdns='cgo+9' ./sake run ping --servers vm
go package net: confVal.netCgo = true  netGo = false
go package net: using cgo DNS resolver
go package net: hostLookupOrder(host1vm) = cgo

TASK [ping: Pong] ***************************************************************

host1vm | pong

Related but unhelpful (I'm not running alpine, just regular archlinux-amd64): https://groups.google.com/g/golang-nuts/c/G-faJ0bthz0

Not a Go dev, my very cursory understanding is netcgo will try to call the system resolver instead of Go's own and perhaps Go's own implementation wont recurse by default or whatever.

Not sure if it's reasonable to just perform the net.LookupIP before calling ssh.Dial. You'd think dial would just use that internally but perhaps not.

Actually, it's possible the working test above defaults to netcgo?

rktjmp commented 1 year ago

Infact it seems it does:

λ GODEBUG=netdns=9 go run main.go
go package net: confVal.netCgo = false  netGo = false
go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(host1vm) = cgo
host1vm. IN A 192.168.122.60
rktjmp commented 1 year ago

May not be solvable without cgo?

https://news.ycombinator.com/item?id=17799874 (2018, but the code is largely unchanged in master)

Here's the code that determines if it needs to fall back to cgo: https://github.com/golang/go/blob/161874da2ab6d5372043a1f393...

Notably, it can only handle the following sources: files, dns, myhostname, mdns* (but only a subset of those if you read the gory details).

It doesn't handle the fairly uncommon "mymachines" or "resolve" (sometimes used in the systemd world nowadays)

λ cat /etc/nsswitch.conf
# Name Service Switch configuration file.
# See nsswitch.conf(5) for details.

passwd: files systemd
group: files [SUCCESS=merge] systemd
shadow: files systemd
gshadow: files systemd

publickey: files

hosts: mymachines libvirt libvirt_guest resolve [!UNAVAIL=return] files myhostname dns
networks: files

protocols: files
services: files
ethers: files
rpc: files

netgroup: files

It's reasonable to close this as an unsupported usecase if you felt so.

alajmo commented 1 year ago

It seems you can set the resolver in the Dial function https://stackoverflow.com/questions/30043248/why-golang-lookup-function-cant-provide-a-server-parameter, I will investigate this a bit further (cheers for all the investigation you did), but it would be a shame not to be able to use sake in these situations when you have virtual machines locally, especially if it already works with established software (ssh, pyinfra, ansible, etc.). Obviously making it work automatically would be the best, but perhaps a user option could be used as a last resort.

Ideally, I'd want to avoid cgo, you could make different builds (with/without cgo) but it isn't pretty IMO.

rktjmp commented 1 year ago

Yeah I did wonder if pulling in a more complete dns library would be the only real fix. Happy to test a branch if you decide to go that way.

I can just use the IP addresses to hit the virtual machines but it's a bit of a bore changing them each time.

I do wonder if you'd end up having the same issue pop up with the default dns in other cases since its focus is allegedly narrow, but any non cgo/glibc resovler wont be able to use funny things people might put in nsswitch so :shrug:

alajmo commented 1 year ago

I've read a bit more and it seems:

Anyway, I will set CGO_ENABLED=1, as it is by default, that way there's a workaround if you want to use the go resolver (creating an alias alias sake=GODEBUG=netdns=go sake) and it will have the same behavior as other established software + 0.5 MB size decrease.