PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.71k stars 910 forks source link

DDoS attack with random A requests causes SQL backend overload #11784

Open bnchdan opened 2 years ago

bnchdan commented 2 years ago

Hi,

Short description

DDoS attack with random A requests causes SQL backend overload

The zone cache feature is only caching the "domains" table, it's not caching the each record in the backend. (https://github.com/PowerDNS/pdns/pull/9464)

This attack will work for domains that exist.

Can we consider caching all records ?

Zerg2000 commented 1 year ago

Hi, my pdns servers were under DDoS attacks directed towards nonexistent records, what has caused high load on SQL backend, which was the main bottleneck. After some tweaking I was able to optimize everything to resist it, but it is still far from perfect and additional cache would help greatly. There should also be option to not fallback to SQL backend for nonexistent records if all records were read into cache.

Lets say we have example.com zone, and www.example.com record. Attack was directed at requesting A records for RANDOM_NAME.www.example.com. Such attack can be recreated using dnsgen (https://github.com/isc-projects/dnsgen):

for i in $(seq 1 10000000); do echo "test${i}.www.example.com A" >> dnsgen.txt; done
dnsgen -i INTERFACE -a LOCAL_IP -m PDNS_SERVER_MAC -s PDNS_SERVER_IP -p 53 -d dnsgen.txt -T 8

You might need to run more than one instance of dnsgen for testing.

ctheune commented 9 months ago

We've seen multiple instances of an attack like this as well. Also with the impact that the database became the bottleneck. This was at a rate of around 10k udp/s (and apparantely around 20mib/s bandwidth).

ctheune commented 9 months ago

So, this seems like a task for dnsdist. We just had multiple attacks today and were able to whip up a first reasonable defense using this:

          addAction(MaxQPSIPRule(25, 24, 48, 100), DropAction())

We saw the attacks being distributed but easily aggregatable by this rule and I think this should help for now. Not sure whether 25 per /24 with a 100 burst is reasonable, but our considerations for data center operators and eyeball networks suggested this should be fine for us.

bnchdan commented 9 months ago

So, this seems like a task for dnsdist. We just had multiple attacks today and were able to whip up a first reasonable defense using this:

          addAction(MaxQPSIPRule(25, 24, 48, 100), DropAction())

We saw the attacks being distributed but easily aggregatable by this rule and I think this should help for now. Not sure whether 25 per /24 with a 100 burst is reasonable, but our considerations for data center operators and eyeball networks suggested this should be fine for us.

@ctheune the rule will not work if the DDOS attack is with IP address Spoofing.

rgacogne commented 9 months ago

Related to #9326 and https://github.com/PowerDNS/pdns/issues/3888

hlindqvist commented 9 months ago

So, this seems like a task for dnsdist. We just had multiple attacks today and were able to whip up a first reasonable defense using this:

          addAction(MaxQPSIPRule(25, 24, 48, 100), DropAction())

We saw the attacks being distributed but easily aggregatable by this rule and I think this should help for now. Not sure whether 25 per /24 with a 100 burst is reasonable, but our considerations for data center operators and eyeball networks suggested this should be fine for us.

@ctheune the rule will not work if the DDOS attack is with IP address Spoofing.

In addition to this, this type of drop rule on an authoritative server is not generally recommended as it can open up for causing DoS in whole new ways. Like with one malicious client making your domain unreachable for every client using the same resolver server (which is what will get blocked, not the specific client).
Potentially leading to scenarios like all users in one enterprise, all customers of one ISP and similar all getting blocked as a group.

It can still be a useful tool in desperate situations even for authoritative servers, but does not seem like a good "set it and forget it" config, as it introduces these kinds of new risks.

ctheune commented 9 months ago

@hlindqvist sure. It's a cat and mouse game anyway. But yes, for posterity and anyone stumbling over this post in the future, that's great advice, thanks!

When establishing that rule due to acute current attacks we considered both eyeball clients and dc clients. We decided to make a general rule (for now) that is oriented towards the smallest unit of routes announced publicly. DNS has become quite complex and challenging and I'm aware that one rule won't be a fit all forever. I'm happy to have dnsdist in the loop now so I we can respond more constructively in the future.

I experimented with triggering it and noticed that it blocks quickly but also lets go of the block quickly enough.

So for now, it's one more layer that is a definitive improvement from before and won't end up choking the backend as much and I'm expecting to having to dial it in further.

klaus-nicat commented 9 months ago
          addAction(MaxQPSIPRule(25, 24, 48, 100), DropAction())

We did this years ago and it cause much more problems with our customer due to the blocking of legitime requests. For example in our case we see most of these random subdomain attacks coming from open resolver and public resolvers (8.8.8.8). So blocking these resolvers also blocks legitime users. To withstand such attacks, currently the only option is to use a faster nameserver: pdns with faster backend, or dnsdist with a slow pdns backend server and a fast backend server (NSD,Knot,PDNS-LMDB) for zones under attack. The latter option implies that you need some kind of logic to detect the attacked zone and move it to some other backend.

ctheune commented 9 months ago

Yeah. I've gotten the message that this is just a stop gap. ;)

As a proper solution we're looking into deferring our public facing name servers to our registrar's infrastructure and switching to a "hidden master" setup. Beefing up individual servers to handle DDoS isn't really our business, so I guess we'll have to stop being on the front line at this point.

However, PDNS-LMDB (with axfr I guess?) did sound interesting. Knot is on the horizon as a resolve here and NSD I'll have to research.