PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.49k stars 890 forks source link

Dynamic DNS (rfc2136) performance with PostgreSQL backend #12361

Open v1shnya opened 1 year ago

v1shnya commented 1 year ago

Short description

Lower than expected (yes, it's very subjective) performance of Dynamic DNC updates. Got a sustained rate of 100 updates per second when usual DNS request numbers are 7K per sec with all caching disabled and 30K per sec with caches enabled for a zone with 1M records.

Environment

Steps to reproduce

  1. Configured PowerDNS with the below parameters
    [/home/dev/]$ grep ^[a-z].* ../authoritative-pgsql/templates/pdns/pdns.conf
    allow-dnsupdate-from=0.0.0.0/0
    cache-ttl=30
    distributor-threads=1
    dnsupdate=yes
    launch=gpgsql
    gpgsql-host= <PDNS_IP>
    gpgsql-port=5433
    gpgsql-dbname=powerdns
    gpgsql-user=powerdns
    gpgsql-password= ....
    gpgsql-prepared-statements=yes
    max-cache-entries=2000000
    max-packet-cache-entries=2000000
    primary=yes
    query-cache-ttl=30
    receiver-threads=16
    retrieval-threads=16
    reuseport=yes
  2. Run dnsperf updates via dnsperf -S 2 -u -d ~/DNSperf/DNS_RRs_updates_list.txt -s <PDNS_IP> Where DNS_RRs_updates_list.txt file contains one million of update queries that looks like
    performance.test
    add test_1 3600 A 1.1.1.1
    send

Expected behaviour

As a Christmas gift I would expect a few hundreds updates per sec, that's great for enterprise environment in rush hours. Probably, multithreaded execution could be an answer here ;-)

Actual behaviour

Dnsperf and PowerDNS results are well correlated, they both show Question/answer latency ~9.5ms that gives one hundred updates per second assuming single threaded execution. Tried to change distributor-threads parameter value, but no effect.

Please, keep in mind that ping RTT time between PowerDNS server and PostgreSQL database in my lab is nearly 0.3 ms.

Statistics:
  Updates sent:         56020
  Updates completed:    55920 (99.82%)
  Updates lost:         0 (0.00%)
  Updates interrupted:  100 (0.18%)
  Response codes:       NOERROR 55920 (100.00%)
  Average packet size:  request 60, response 34
  Run time (s):         535.715141
  Updates per second:   104.383833
  Average Latency (s):  0.957076 (min 0.020789, max 1.145623)
  Latency StdDev (s):   0.039599
PowerDNS 4.7.2
Uptime: 9.9 minutes
Queries/second, 1, 5, 10 minute averages: 78, 82, 59. Max queries/second: 105
Backend query load, 1, 5, 10 minute averages: 0, 0, 0. Max queries/second: 0
Total queries: 56020. Question/answer latency: 9.21ms

Variables
backend-queries     112040  Number of queries sent to the backend(s)
dnsupdate-answers            56020  DNS update packets successfully answered.
dnsupdate-changes   112040  DNS update changes to records in total.
dnsupdate-queries           56020   DNS update packets received.
udp-answers         56020   Number of answers sent out over UDP
udp-answers-bytes           1904680 Total size of answers sent out over UDP
udp-queries         56020   Number of UDP queries received
udp4-answers                56020   Number of IPv4 answers sent out over UDP
udp4-answers-bytes  1904680 Total size of answers sent out over UDPv4
udp4-queries                 56020  Number of IPv4 UDP queries received
cpu-iowait          4242    Time spent waiting for I/O to complete by the whole system, in units of USER_HZ
fd-usage                     61 Number of open filedescriptors
latency             9212    Average number of microseconds needed to answer a question
real-memory-usage   118165504   Actual unique use of memory in bytes (approx)
receive-latency     917025  Average number of microseconds needed to receive a query
send-latency                40  Average number of microseconds needed to send the answer
sys-msec                     29507  Number of msec spent in system time

Other information

Would be great to understand if something could be changed to improve DNS update rate.

zeha commented 1 year ago

Please report how long the SQL queries for each update (under load) take for you on the postgresql side.

v1shnya commented 1 year ago

Hi Chris, Merry Christmas! thank you for coming back to me

Postgres DB is not loaded at all, 16 vCPU with 32 GB MEM

powerdns=> \timing on
Timing is on.
powerdns=> select * from domains;
 id |       name       | master | last_check |  type  | notified_serial | account | options | catalog
----+------------------+--------+------------+--------+-----------------+---------+---------+---------
 14 | performance.test |        |            | NATIVE |                 |         |         |
(1 row)

Time: 0.727 ms
powerdns=> select count(*) from records;
 count
-------
 56021
(1 row)

Time: 9.672 ms
rgacogne commented 1 year ago

I guess adding a new entry would also involve at least one insertion?

v1shnya commented 1 year ago

Hi Remy, I would say that at least two insertions (if we don't consider DNSsec)

Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Processing started.
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: starting transaction.
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Adding record test_5858.performance.test|A
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Increasing SOA serial (2022129448 -> 2022129449)
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Update completed, 2 changed records committed.

Looking at the code https://github.com/PowerDNS/pdns/blob/master/pdns/rfc2136handler.cc I see a number of sequential database requests, it's not just two inserts - it's a lot of prep work done.

rgacogne commented 1 year ago

Right, so perhaps it would be useful to look at how long the insertions take on the PG side as well?

v1shnya commented 1 year ago

Under the load the insertion time is looking normal, see below

powerdns=> \timing on
Timing is on.
powerdns=> select count(*) from records;
 count
-------
 86328
(1 row)

Time: 9.795 ms
powerdns=> select * from records;
   id    | domain_id |            name             | type |                                             content
                          | ttl  | prio | disabled | ordername | auth
---------+-----------+-----------------------------+------+------------------------------------------------------------------------
--------------------------+------+------+----------+-----------+------
 3010726 |        17 | test_1.performance.test     | A    | 1.1.1.1
                          | 3600 |    0 | f        |           | t
 3036784 |        17 | test_13030.performance.test | A    | 1.1.1.1
                          | 3600 |    0 | f        |           | t
 ........
Time: 89.051 ms

powerdns=> insert into records (content,ttl,prio,type,domain_id,disabled,name,ordername,auth) values ('1.1.1.1',3600,0,'A',17,false,'test',null,true);
INSERT 0 1
Time: 4.775 ms
v1shnya commented 1 year ago

Remi, I got your point! If one SQL insert statement takes 4.75 ms to complete than two inserts per one DNS update will take 9.5 ms that we’ve seen above.

Still, did I miss some configuration trick to make those updates in parallel?

rgacogne commented 1 year ago

Increasing distributor-threads does increase the number of parallel connections to the database for regular queries, but I'm not sure it applies to UPDATE queries which are special in many ways. In particular they need to have a consistent view of the records in the backend database to avoid races. It also depends on whether PG can actually do concurrent insertions in your setup, of course.

zeha commented 1 year ago

My question was really about getting a list of queries and their execution times for a single UPDATE, as it happens in your environment.

The update machinery is quite involved, and maybe queries are being run that can be optimized, but its only helpful to do so starting at some baseline.

v1shnya commented 1 year ago

Chris, I would propose to start from architecture as Remi suggested. If you could confirm how DDNS updates are managed - in a sequence, in parallel, how we could control it - it would be much clear what we need to tune.

I would leave SQL optimization to the later stage. By the way, which tool do you use for the analysis? In Internet there are some tricks for Postgres, will try it.

Have you done internal DDNS tests? If yes, what numbers have you seen before?

Habbie commented 1 year ago

Hello @v1shnya, if you run a benchmarking suite against your Postgres, what kind of number do you get for write transactions per second?

v1shnya commented 1 year ago

Hi Peter, Merry Christmas! used 'pg_stat_statements' extention for PostgreSQL.

Run DNSperf tool with dynamic updates

Statistics:
  Updates sent:         7187
  Updates completed:    7087 (98.61%)
  Updates lost:         0 (0.00%)
  Updates interrupted:  100 (1.39%)

  Response codes:       NOERROR 7087 (100.00%)
  Average packet size:  request 54, response 29
  Run time (s):         130.096691
  Updates per second:   54.474867

  Average Latency (s):  1.820872 (min 0.145479, max 2.494103)
  Latency StdDev (s):   0.317027

I believe that Updates per sec numbers above are lower due to increased RTT time between PowerDNS VM and PostgreSQL (ping gives me 0.5 ms today, yesterday it was 0.3 ms).

Below you could find 'pg_stat_statements' aggregated results for 7187 dns dynamic update requests

powerdns=# select query, calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddev_exec_time, rows from pg_stat_statements where userid = 16385 and dbid = 16400;
-[ RECORD 1 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select domains.id, domains.name, records.content, domains.type, domains.master, domains.notified_serial, domains
.last_check, domains.account from domains LEFT JOIN records ON records.domain_id=domains.id AND records.type=$2 AND records.name=do
mains.name WHERE records.disabled=$3 OR $1
calls            | 2
total_exec_time  | 0.217054
min_exec_time    | 0.08111399999999999
max_exec_time    | 0.13594
mean_exec_time   | 0.108527
stddev_exec_time | 0.027413000000000007
rows             | 4
-[ RECORD 2 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select domains.id, domains.name, domains.type, domains.master, domains.last_check, records.content from domains
LEFT JOIN records ON records.domain_id=domains.id AND records.type=$1 AND records.name=domains.name where domains.type in ($2, $3)
calls            | 4
total_exec_time  | 0.04084
min_exec_time    | 0.009063
max_exec_time    | 0.011798
mean_exec_time   | 0.01021
stddev_exec_time | 0.0010028975520959252
rows             | 0
-[ RECORD 3 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select domains.id, domains.name, domains.type, domains.notified_serial, domains.options, domains.catalog, record
s.content from records join domains on records.domain_id=domains.id and records.name=domains.name where records.type=$1 and records
.disabled=$2 and domains.type in ($3, $4)
calls            | 4
total_exec_time  | 0.031763
min_exec_time    | 0.00754
max_exec_time    | 0.008631
mean_exec_time   | 0.00794075
stddev_exec_time | 0.00044638793386470477
rows             | 0
-[ RECORD 4 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | SELECT content,ttl,prio,type,domain_id,disabled::int,name,auth::int FROM records WHERE disabled=$3 and type=$1 a
nd name=$2
calls            | 14374
total_exec_time  | 340.84624400000087
min_exec_time    | 0.012518
max_exec_time    | 0.63608
mean_exec_time   | 0.02371269263948803
stddev_exec_time | 0.009148093920737866
rows             | 14374
-[ RECORD 5 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | SELECT content,ttl,prio,type,domain_id,disabled::int,name,auth::int FROM records WHERE disabled=$3 and name=$1 a
nd domain_id=$2
calls            | 7187
total_exec_time  | 6664.0139130000025
min_exec_time    | 0.011294
max_exec_time    | 3.363671
mean_exec_time   | 0.9272316561847818
stddev_exec_time | 0.5066961052845157
rows             | 7187
-[ RECORD 6 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | delete from records where domain_id=$1 and name=$2 and type=$3
calls            | 7187
total_exec_time  | 5551.925428000003
min_exec_time    | 0.012487999999999999
max_exec_time    | 2.9219649999999997
mean_exec_time   | 0.7724955374982629
stddev_exec_time | 0.4098099475159046
rows             | 7187
-[ RECORD 7 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select kind,content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=$1
calls            | 7187
total_exec_time  | 74.40643500000004
min_exec_time    | 0.008514
max_exec_time    | 0.331546
mean_exec_time   | 0.010352919855294277
stddev_exec_time | 0.004509243279275985
rows             | 0
-[ RECORD 8 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select $1
calls            | 1
total_exec_time  | 0.005887
min_exec_time    | 0.005887
max_exec_time    | 0.005887
mean_exec_time   | 0.005887
stddev_exec_time | 0
rows             | 1
-[ RECORD 9 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | begin
calls            | 7187
total_exec_time  | 5.0743099999999774
min_exec_time    | 0.000379
max_exec_time    | 0.010511
mean_exec_time   | 0.0007060400723528597
stddev_exec_time | 0.00041882961227168264
rows             | 0
-[ RECORD 10 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | insert into records (content,ttl,prio,type,domain_id,disabled,name,ordername,auth) values ($1,$2,$3,$4,$5,$6,$7,
$8,$9)
calls            | 14374
total_exec_time  | 1628.5856740000027
min_exec_time    | 0.052046999999999996
max_exec_time    | 0.7095680000000001
mean_exec_time   | 0.11330079824683442
stddev_exec_time | 0.04159129781546047
rows             | 14374
-[ RECORD 11 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | delete from records where domain_id=$1 and name=$2 and type is null
calls            | 7187
total_exec_time  | 5158.393028999998
min_exec_time    | 0.006804
max_exec_time    | 2.86177
mean_exec_time   | 0.7177393946013654
stddev_exec_time | 0.380159914890431
rows             | 0
-[ RECORD 12 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | SELECT content,ttl,prio,type,domain_id,disabled::int,name,auth::int FROM records WHERE disabled=$4 and type=$1 a
nd name=$2 and domain_id=$3
calls            | 14374
total_exec_time  | 14326.224386999973
min_exec_time    | 0.0070929999999999995
max_exec_time    | 4.965016
mean_exec_time   | 0.9966762478781147
stddev_exec_time | 0.5581248219904835
rows             | 0
-[ RECORD 13 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select id,name,master,last_check,notified_serial,type,options,catalog,account from domains where name=$1
calls            | 7187
total_exec_time  | 69.50807400000004
min_exec_time    | 0.007625
max_exec_time    | 0.13239
mean_exec_time   | 0.009671361346876305
stddev_exec_time | 0.0026756020695306035
rows             | 7187
-[ RECORD 14 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | select content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=$1 and domainmeta
data.kind=$2
calls            | 21561
total_exec_time  | 243.827968
min_exec_time    | 0.008226
max_exec_time    | 0.396304
mean_exec_time   | 0.011308750429015337
stddev_exec_time | 0.004896940368968186
rows             | 0
-[ RECORD 15 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query            | commit
calls            | 7187
total_exec_time  | 6.088857999999995
min_exec_time    | 0.000368
max_exec_time    | 0.009078
mean_exec_time   | 0.0008472043968276024
stddev_exec_time | 0.0004659483550670446
rows             | 0
Habbie commented 1 year ago

If you run a Postgres (so non-DNS) benchmarking suite against your Postgres, what kind of number do you get for write transactions per second?

v1shnya commented 1 year ago

Postgres DB VM is dedicated to PowerDNS, I'm not using it for anything else. Looking at Postgres numbers I don't see any contention there, 'mean_exec_time', i.e. average, is well below 1 ms for all queries

Habbie commented 1 year ago

I'm still waiting for Postgres benchmark numbers, to see how many write transactions your database is capable of. Until we have that number, any testing we do with pdns+postgres is meaningless.

v1shnya commented 1 year ago

Hi Peter! Please, see DB baseline performance tested via standard tool pgbench -c 4 -j 5 -t 5000 -d powerdns

transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 4
number of threads: 4
number of transactions per client: 5000
number of transactions actually processed: 20000/20000
latency average = 4.834 ms
tps = 827.446852 (including connections establishing)
tps = 837.155129 (excluding connections establishing)

Where in TPC-B scenario each transaction includes 1xSELECT + 3x UPDATE + 1x INSERT statements within. Details via
https://www.postgresql.org/docs/current/pgbench.html Note "What Is the “Transaction” Actually Performed in pgbench?"

Habbie commented 1 year ago

ok, and with one client?

v1shnya commented 1 year ago

with one client getting the numbers

transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 5000
number of transactions actually processed: 5000/5000
latency average = 4.060 ms
tps = 246.312160 (including connections establishing)
tps = 247.395992 (excluding connections establishing)
Habbie commented 1 year ago

Thank you. So the upper bound for powerdns would be 250 or 800, depending on how well we parallelise (this I do not know). If the bound is 250, and we do 100, I am not unhappy, but this asks for some more investigation into how we do things.

v1shnya commented 1 year ago

Thank you, Peter! The use case is to register DHCP leases to PowerDNS via DDNS mechanism. 50 DDNS TPS is too low to be on a safe side in any reasinably large network. Here I'm talking about DDNS + regular DNS queries, in a mix I've seen some contention. Filed another request

klaus-nicat commented 1 year ago

Will PowerDNS open a new DB connection for each DDNS packet? (Like it does for every AXFR) If YES, then the DB-Connection time is problematic.