coredns / rrl

Response Rate Limiting Plugin for CoreDNS
Apache License 2.0
23 stars 21 forks source link

handle edge case when there was error, but response was written #40

Closed Tantalor93 closed 11 months ago

Tantalor93 commented 11 months ago

Hello, when rrl plugin is combined with rewrite plugin and upstream resolution fails, then rrl does not propagate rewritten response to the other plugins, for example prometheus plugin.

Actual impact can be observed using this example Corefile

.:53 {
  rewrite edns0 local set 0xffee {client_ip}
  log
  prometheus
  forward . 1.2.3.4
  rrl . {
      ipv4-prefix-length 32
      ipv6-prefix-length 128
      requests-per-second 1000
    }
}

then send a DNS query

dig @127.0.0.1 google.com

this returns

;; communications error to 127.0.0.1#53: timed out
;; communications error to 127.0.0.1#53: timed out

then observe metrics

curl localhost:9153/metrics | grep coredns_dns_responses_total

you will see, that failures are measured as NOERRORs instead of failures (SERVFAIL)

# HELP coredns_dns_responses_total Counter of response status codes.
# TYPE coredns_dns_responses_total counter
coredns_dns_responses_total{plugin="",rcode="NOERROR",server="dns://:53",view="",zone="."} 9

This PR adjusts the condition for checking whether the response was written in order to propagate error responses through plugin chain properly

Tantalor93 commented 11 months ago

@chrisohaver if you could take a look at this PR when you have time 🙏

chrisohaver commented 11 months ago

I'm trying to figure out why this fixes the issue.

Is it because rewrite causes plugin.NextOrFailure to return an error or an empty message? If so, why is that the case?

Tantalor93 commented 11 months ago

I'm trying to figure out why this fixes the issue.

Is it because rewrite causes plugin.NextOrFailure to return an error or an empty message? If so, why is that the case?

I'll try to describe what causes the issue:

let plugins be chained in a way that it forms this chain: prometheus <-> rrl <-> rewrite <-> forward (maybe the issue could be mitigated by making rrl plugin first?)

forward fails the upstream resolution and returns dns.RcodeServerFailure, upstreamErr

rewrite plugin returns dns.RcodeSuccess and error and writes rewritten response to the response writer

rrl plugin passes nonWriter to the next plugin (which is rewrite) and then checks err != nil || nw.Msg == nil from next plugin, which is true in this case (because err is not nil) and immediately returns without writing message to writer using w.WriteMsg(nw.Msg), therefore previous plugin (prometheus) is not able to get rcode from message (rrl throws away the message)

prometheus plugin checks rcode and gets default value, because correct message was not written by rrl

why this fixes the issue: by only checking rcode from call plugin.NextOrFailure in rrl using plugin.ClientWrite, we check that the response was written by the next plugin. Whether the next plugin written the response should be indicated by status code returned by the next plugin, rrl should not care whether there was an error up the chain, only whether some response was written

chrisohaver commented 11 months ago

(maybe the issue could be mitigated by making rrl plugin first?)

perhaps.

rrl should not care whether there was an error up the chain, only whether some response was written

I'm skeptical of the ClientWrite function, because all it does IIRC is check the response code - it makes assumptions based on the response code, which is super kludgy and I'm not certain those assumptions are always correct.

I think the issue may be between forward and rewrite. if forward fails, it should not be writing the failure to the response writer, and should return a servfail code.

What is rewrite configured to do in your example?

chrisohaver commented 11 months ago

Assuming rewrite is configured to rewrite the response code, then It should probably also clear the error.

(edit: specifically when rewriting an error code to a non-error code)

chrisohaver commented 11 months ago

Ah ... it's due to this bit from rewrite...

                // The next plugins didn't write a response, so write one now with the ResponseReverter.
                // If server.ServeDNS does this then it will create an answer mismatch.

So, when rewrite is used, it always writes the error responses and doesn't leave it to the server.