Closed daleki closed 1 year ago
Can you help me reproduce this? I have a similar setup but I'm not seeing that problem.
Is this output for with --diff2
? Do you get different results without --diff2
?
Could you try an earlier release like https://github.com/StackExchange/dnscontrol/releases/tag/v3.29.1 ?
Thanks for the reply! The output above is without --diff2
. With --diff2
I get a different result:
+preview | --> RUN --no-cache dnscontrol --diff2 preview
+preview | [INFO: Diff2 algorithm in use.]
+preview | ******************** Domain: cloudhelix.com
+preview | ******************** Domain: kentik.com
+preview | 1 correction (gcloud)
..skip some lines here
+preview | #1: - DELETE NS kentik.net dns1.p01.nsone.net. ttl=3600
+preview | - DELETE NS kentik.net dns2.p01.nsone.net. ttl=3600
+preview | - DELETE NS kentik.net dns3.p01.nsone.net. ttl=3600
+preview | - DELETE NS kentik.net dns4.p01.nsone.net. ttl=3600
v3.29.1 yields same results with and without --diff2
.
CC @costasd and @riyadhalnur for assistance
This is odd because nothing was intended to change regarding with NS handling recently. That's not to say things didn't change, or that some other change didn't have an unexpected side-effect. I'm just saying that at this point I haven't identified the problem.
Next step: Which was the last release that didn't have this problem? Binaries are available here: https://github.com/StackExchange/dnscontrol/tags
(also: If anyone else has a similar issue, please speak up. Does it involve NS1?)
Next step: Which was the last release that didn't have this problem? Binaries are available here: https://github.com/StackExchange/dnscontrol/tags
We were on the 3.19.0 release for a long time with no issues. Then we started getting the errors below which prompted the upgrade to v3.31.2:
+preview | ----- Getting nameservers from: ns1
+preview | provider code leaves trailing dot on nameserver
The errors happened even with 3.19.0? i.e. there's a possibility that NS1's API changed?
Give the tlim_b2293_ns1_nameservers branch a try.
git clone https://github.com/StackExchange/dnscontrol.git
cd dnscontrol
git checkout tlim_b2293_ns1_nameservers
go install
This will install a new binary in ~/bin
. Give that a try.
The 3.19.0 error is a different one: provider code leaves trailing dot on nameserver
, then it just exits. I'll try tlim_b2293_ns1_nameservers.
Hi,
from @daleki 's output (thanks!) looks like dnscontrol picks 4 nameservers instead of 4+4 - ending up changing them everytime. I'm feeling that's (relatively) where the bug lies, but haven't really verified anything.
Unfortunately the trailing dot bug that was fixed recently wont allow for a lot of bisecting here, at least on ns1's side.
I'm on a trip so it'll take a bit, but I'll try to replicate the setup (got access to gcloud and ns1) and see if I can debug it further.
so.. I created the following setup, with the relevant accounts & zones, set in NS1 and GCLOUD for example.com
.
var REG_NONE = NewRegistrar('none');
var DNS_NS1 = NewDnsProvider('ns1');
var DNS_GCLOUD = NewDnsProvider('gcloud');
D("example.com", REG_NONE,
DefaultTTL('1h'), // default ttl for records
NAMESERVER_TTL('1h'), // default ttl for nameservers
DnsProvider(DNS_GCLOUD, 4), // grab 4 nameservers
DnsProvider(DNS_NS1, 4), // grab 4 nameservers
A('@', '1.2.3.4')
);
And I don't seem able to reproduce it, with latest master:
$ ../../oss/dnscontrol/dnscontrol --diff2 preview --domains example.com
[INFO: Diff2 algorithm in use.]
******************** Domain: example.com
Done. 0 corrections.
$ ../../oss/dnscontrol/dnscontrol preview --domains example.com
[INFO: Old diff algorithm in use. Please test --diff2 as it will be the default in releases after 2023-05-07. See https://github.com/StackExchange/dnscontrol/issues/2262]
******************** Domain: example.com
Done. 0 corrections.
is there anything missing in this setup in order to trigger the behavior?
same (no changes) with a quick build out of the v3.31.2 tag
Hi @costasd ! Great to see you here and thanks for taking a look. I can't think of anything missing other than state of already pushed dns records existing on providers backends. I tried a few more tests with 3.31.2 release and setting --diff2, but can't isolate the bug further at the moment than saying it seems related to ns1. When I set DnsProvider(DNS_NS1, 0) and try adding other providers like R53 I see expected output.
@daleki Maybe it is a problem at NS1? Try removing records and re-adding them. i.e. use DnsProvider(DNS_NS1, 0)
and do a "push" to clear things out. Then DnsProvider(DNS_NS1, 4)
and push
again.
We fixed by manually updating state in NS1 via ui. We deleted all google NS records in all zones on NS1 and manually added NS1 ns records for all zones. Then we ran a normal dnscontrol push with the new diff2 option and got the desired result.
➜ dig +short kentik.com ns @dns1.p01.nsone.net.
dns1.p01.nsone.net.
dns2.p01.nsone.net.
dns3.p01.nsone.net.
dns4.p01.nsone.net.
ns-cloud-c1.googledomains.com.
ns-cloud-c2.googledomains.com.
ns-cloud-c3.googledomains.com.
ns-cloud-c4.googledomains.com.
Thanks for the help @costasd and @tlimoncelli !
In previous versions of dnscontrol we used a config like this for most of our domains:
In recent versions (including latest v3.31.2) this now tries to replace NS records of another provider for some reason. Any idea why these records might be affected now?