algorand / go-algorand

Algorand's official implementation in Go.
https://developer.algorand.org/
Other
1.35k stars 469 forks source link

Telemetry SRV lookup doesn't honor DNSSecurityFlags #1283

Closed tsachiherman closed 3 years ago

tsachiherman commented 4 years ago

Subject of the issue

When configuring DNSSecurityFlags = 0, the algod still trying to use DNSSec for telemetry SRV records retrieval.

Log entries

{"file":"telemetryURIUpdateService.go","function":"github.com/algorand/go-algorand/tools/network.(*telemetryURIUpdater).lookupTelemetryURL","level":"info","line":90,"msg":"An issue occurred reading telemetry entry for '_telemetry._tls.mainnet.algorand.network': ReadFromBootstrap: Failed to obtain SRV with DNSSEC: no answer for (_telemetry._tls.mainnet.algorand.network., 33) from DNS servers [1.1.1.1:53 8.8.8.8:53 77.88.8.8:53 8.26.56.26:53]","time":"2020-07-21T16:00:52.370954+02:00"}

Local DNS server lookup

dig +dnssec
_telemetry._tls.mainnet.algorand.network. @193.205.160.3
; <<>> DiG 9.9.4-RedHat-9.9.4-72.el7 <<>> +dnssec
_telemetry._tls.mainnet.algorand.network. @193.205.160.3
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61830
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;_telemetry._tls.mainnet.algorand.network. IN A
;; AUTHORITY SECTION:
algorand.network.    3493    IN    SOA    scott.ns.cloudflare.com.
dns.cloudflare.com. 2034649244 10000 2400 604800 3600
algorand.network.    3493    IN    RRSIG    SOA 13 2 3600 20200723134947
20200721114947 34505 algorand.network.
jjJzgn7b/6pBQmR9Qvjw6w37HiDGGzCeer+FrFUdjMLRc7pXxruqvE8F
zHF3OKdhBhs4bytg3CYtEWteMB5x0w==
_telemetry._tls.mainnet.algorand.network. 3493 IN RRSIG    NSEC 13 5
3600 20200723134947 20200721114947 34505 algorand.network.
AYrQwccRBDFILE/Yo9Bgjd3H8JbQ0zdQSrj0nCjcgVv4yyeK/i8/4WVu
3xzmSi8LtGYVRr2h1snLBqVnGQgH4A==
_telemetry._tls.mainnet.algorand.network. 3493 IN NSEC
\000._telemetry._tls.mainnet.algorand.network. HINFO MX TXT AAAA LOC SRV
CERT SSHFP RRSIG NSEC TLSA HIP TYPE61 SPF CAA
;; Query time: 0 msec
;; SERVER: 193.205.160.3#53(193.205.160.3)
;; WHEN: Wed Jul 22 14:51:50 CEST 2020
;; MSG SIZE  rcvd: 430

Additional Information

algorandskiy commented 4 years ago

Can't it be config file migration issue? Like if the config file does not have version, it is assumed to be version zero and the migration happens. Since DNSSecurityFlags has default value 1, DNSSecurityFlags=0 is treated as non-existing and migrated to 1.

tsachiherman commented 4 years ago

Can't it be config file migration issue? Like if the config file does not have version, it is assumed to be version zero and the migration happens. Since DNSSecurityFlags has default value 1, DNSSecurityFlags=0 is treated as non-existing and migrated to 1.

Sure. But why would it work correctly for the gossip network and incorrectly for the telemetry ?

algorandskiy commented 3 years ago

I tried to recreate the issue on mainnet node by clearing out list of DNSSEC servers. The only way I managed to get an error message (see below) when I set DNSSecurityFlags to 1. Setting it to 0 works as expected.

{"file":"telemetryURIUpdateService.go","function":"github.com/algorand/go-algorand/tools/network.(*telemetryURIUpdater).lookupTelemetryURL","level":"info","line":90,"msg":"An issue occurred reading telemetry entry for '_telemetry._tls.mainnet.algorand.network': ReadFromBootstrap: Failed to obtain SRV with DNSSEC: no answer for (_telemetry._tls.mainnet.algorand.network., 33) from DNS servers []","time":"2020-12-02T17:21:43.106091-05:00"}

Can you provide config.json and logging.config from the failing instance?

I also checked migration path with DNSSecurityFlags = 0 and Version = 0. In this case node.log has messages from both telemetry ("Failed to obtain SRV with DNSSEC") and from wsNetwork ("got no DNS addrs for network"). wsNetwork does not log warning/error on mainnet for this case. Looks like it works as designed.

tsachizehub commented 3 years ago

If you recall, there was an error that was fixed ( related to static code analysis ).. I don't know if this was it or not. I'd like to wait for the next release before attempting to approach them again. On the next release, we're going to have the DNSSEC refactored, and it should be at a state where everything would be working for them. If this won't be the case, I'll attempt to get the above files from them.

algorandskiy commented 3 years ago

It was in test harness, not in prod code. I think I close this now since can't recreate and no new data can be obtained on this point.