Closed ab77 closed 2 years ago
This is only somewhat related to the diagnostic script.
When testing DNS resolution we read the upstream servers directly from dnsmasq on the host, and by default in balenaOS we include 8.8.8.8 in the list of IPv4 DNS servers. https://www.balena.io/docs/reference/OS/configuration/#dnsservers
So in my mind the correct solution is to remove 8.8.8.8 from the server list on balenaOS if IPv4 networking is not available. This can be done by setting dnsServers="null
in config.json.
OR, we change the diagnostics to report both successful and failed DNS servers so it's clear that the IPv6 servers from DHCP are working.
OR, we change the diagnostics to only report a failure if none of the servers can resolve, but this is removing some helpful context IMO.
Agreed on an arch call brainstorm to change the wording to indicate that some of the servers worked, and only some of the servers failed. Maybe raise a warning instead of an error in this case.
I tried printing both successful and failed DNS endpoints, and I think it makes the result look cluttered.
Some networking issues detected:
test_upstream_dns: DNS lookup succeeded for 0.resinio.pool.ntp.org via upstream: 192.168.1.55
test_upstream_dns: DNS lookup succeeded for api.6fd7efae1b8505c1a8b47da01934f844.bob.local via upstream: 192.168.1.55
test_upstream_dns: DNS lookup failed for 0.resinio.pool.ntp.org via upstream: 8.8.8.8
test_upstream_dns: DNS lookup succeeded for api.6fd7efae1b8505c1a8b47da01934f844.bob.local via upstream: 8.8.8.8
It's not immediately obvious to me that 3/4 of those results are good.
@myarmolinsky @chrisys Any suggestions here or should I bring it to a product call? We want to indicate in a clean way that some of the configured upstream DNS servers failed (but not necessarily all of them).
The current test only prints the failed ones and is unclear that the other servers are working fine. The above test prints both successful and failed endpoints but it pretty cluttered.
I might try to break each DNS endpoint into it's own pass/fail test like we do for user app healthchecks?
@klutchell Generally when I see an error I don't assume everything is broken, I just start investigating that individual error. So unless you split up the health check into multiple health checks to display successes and errors, I don't think it is necessary to tell the user which ones were successful. Perhaps that's just me though, definitely interested in @chrisys 's thoughts
@myarmolinsky I think I agree with you. In fact, clicking on the failed test will link you to the documentation describing the test, where I can make it more clear that successful servers are not shown. https://www.balena.io/docs/reference/diagnostics/#test_upstream_dns
Expected Behaviour
Diagnostics
check_networking
passes on IPv6 only networks by attempting to resolve using public IPv6 DNS (e.g. https://developers.google.com/speed/public-dns/docs/using) instead of hard-coding IPv4 IPs.Actual Behaviour
Diagnostics
check_networking
test fails on IPv6 only networks.Steps to Reproduce the Problem
References
https://jel.ly.fish/brainstorm-topic-c3f8670a-8981-42ae-bcbf-1f57e959122d