Open bhoriuchi opened 3 years ago
Removing gssapi.ContextFlagReplay
is turning off replay protection, which includes an auto-incrementing sequence number in the MIC token used to sign the requests and responses. I don't have the source handy to check, but I'm fairly certain this flag is also set by nsupdate
.
This code gets used against an AD server also, I've never had this problem.
I think the gokrb5 error is not terribly helpful. I have a feeling if the server rejects your request for whatever reason, it returns your signature back to you which is why you get that message. In GSSAPI/Kerberos parlance, you the client are the initiator, the server is the acceptor.
Basically that message is the server rejecting the request for some reason, you get the same error if you try and create a wildcard DNS record.
well what is strange is that even though it sends the rejected response it still creates the record and there seems to be a window where subsequent requests work fine after the initial rejection. i can try adding some of the flags back in and see which one(s) reproduce the issue.
it does seem to be the ContextFlagReplay
// reproduces issue
apreq, err := spnego.NewKRB5TokenAPREQ(cl, tkt, key, []int{gssapi.ContextFlagInteg, gssapi.ContextFlagMutual, gssapi.ContextFlagReplay}, []int{gssapi.ContextFlagMutual})
// works
apreq, err := spnego.NewKRB5TokenAPREQ(cl, tkt, key, []int{gssapi.ContextFlagInteg, gssapi.ContextFlagMutual}, []int{flags.APOptionMutualRequired})
well what is strange is that even though it sends the rejected response it still creates the record and there seems to be a window where subsequent requests work fine after the initial rejection.
Terraform negotiates a new Kerberos context for each and every request. They're not reused, even though they could be, because Windows rejects requests using them after 5 minutes despite indicating them as being valid for an hour so it was less hassle to just negotiate a fresh context for each request. The fact that the request is actioned, but still rejected points to something screwy on the server side, that doesn't even begin to make sense.
Here are the flags that nsupdate
uses, the same ones: https://github.com/isc-projects/bind9/blob/4ce5f94333e814aca1faf518e3f9a8c1dfb7caa8/lib/dns/gssapictx.c#L603-L607
There are tests that both this library and the Terraform provider run against a real DNS server which are consistently passing. I'm also running this code against Windows and I've not seen this problem ever.
The only thing I can think of that might screw things up is if you're sending requests to different DNS servers, i.e. negotiate a context with one, send the signed request to another but that shouldn't be possible as from experience the SPN needs to match the hostname of the DNS server.
I am using test AD DC/DNS server spun up in GCP. Pointing to the same server for DNS and KDC. The easiest way to reproduce is by creating a brand new forward zone with nothing in it. In my test i am creating a cname entry. I can try some other record types and see if i can narrow the issue down. I agree it makes no sense like many things in windows. I'll also try to reproduce with nsupdate.
Can you get a pcap for the traffic when it works and when it fails? It needs to be whole packets rather than truncated and I specifically need the TKEY exchange that happens before the actual DNS updates.
I'd like to see if the remote end is actually negotiating different context flags. I'm requesting the Integrity, Mutual and Replay flags but I think the remote end can respond and accept a subset of that, however the first two flags are mandatory and have to be present. I'm currently assuming the remote side honours all three.
In the CGo version of the code, the InitSecContext()
call returns the flags that the remote side chose, you can see here:
https://github.com/bodgit/tsig/blob/5045f82c873fb520fa08726c8c2b7bc3906137cc/gss/apcera.go#L172-L180
The fourth return argument would contain the negotiated flags. I'm not sure how to retrieve those with the gokrb5 implementation but I'm guessing they're buried in one of the structs somewhere, however I think from memory the flags are visible in a pcap so I'd like to check that if possible.
For me, it was resolved after I implemented two things:
Nonsecure
dynamic updates for the managed hosted zone. Kept Secured only
.Create all child permissions
in the managed hosted zone. This permission might be too permissive, depending on your use-case, but the point is that the user should have the right to do what it needs to). if both or one of these is not implemented I got the same error Error updating DNS record: unexpected acceptor flag is not set: expecting a token from the acceptor, not in the initiator
Interesting. Thanks for the info!
Hello There,
I’m getting this issue out of nowhere as "It was working before fine". My setup is with an AD and was successfully working last month.
I got the logs from the DNS Server like:
22/12/2022 11:28:50 07E4 PACKET 00000087F28D0260 UDP Snd X.Y.X.Y 1a60 R U [05a8 REFUSED] SOA (3)toto(5)titi(0)
UDP response info at 00000087F28D0260
Socket = 516
Remote addr X.Y.X.C, port 52700
Time Query=527328, Queued=0, Expire=0
Buf length = 0x0fa0 (4000)
Msg length = 0x00a6 (166)
Message:
XID 0x1a60
Flags 0xa805
QR 1 (RESPONSE)
OPCODE 5 (UPDATE)
AA 0
TC 0
RD 0
RA 0
Z 0
CD 0
AD 0
RCODE 5 (REFUSED)
ZCOUNT 1
PRECOUNT 0
UPCOUNT 1
ARCOUNT 1
ZONE SECTION:
Offset = 0x000c, RR count = 0
Name "(3)toto(5)titi(0)"
ZTYPE SOA (6)
ZCLASS 1
PREREQUISITE SECTION:
empty
UPDATE SECTION:
Offset = 0x001b, RR count = 0
Name "(10)terra02(3)toto(5)titi(0)"
TYPE A (1)
CLASS 1
TTL 3600
DLEN 4
DATA X.Y.Z.14
ADDITIONAL SECTION:
Offset = 0x003f, RR count = 0
Name "(9)615736564(13)sig-DCO10(3)toto(5)titi(0)"
TYPE TSIG (250)
CLASS 255
TTL 0
DLEN 58
DATA
Algorithm: (8)gss-tsig(0)
Signed time = 1671704930
Fudge time = 300
Sig Length = 32
Signature:
04 04 04 ff ff ff ff ff 00 00 00 00 8c 8a fa e3
f9 3b 15 e1 2e d1 69 47 d1 6c 5a 13 08 a1 43 9b
Original XID = 1a60
Extended RCODE = 0
Other Length = 0
Other Data:
Any hints to help me figure it out ?
Thanks,
RCODE 5 (REFUSED)
The server refused the update, for some reason. There was a Windows Server update recently that broke some aspects of Kerberos, I don't know if it would have any affect on this or not.
It appears that the terraform dns provider is throwing the error "Error updating DNS record: unexpected acceptor flag is not set: expecting a token from the acceptor, not in the initiator" from this package. See issue https://github.com/hashicorp/terraform-provider-dns/issues/160
I have traced the issue to the parameters passed to https://github.com/bodgit/tsig/blob/v1.1.1/gss/gokrb5.go#L243
When changing the parameters to match those passed in the ns1 fork https://github.com/ns1/tsig/blob/master/gss/gokrb5.go#L150 the issue does not present itself. I am not sure why this resolves the issue and really have no insight into what the parameters do.
The issue is reproducible on an active directory dns server.