Open v1shnya opened 1 year ago
Please report how long the SQL queries for each update (under load) take for you on the postgresql side.
Hi Chris, Merry Christmas! thank you for coming back to me
Postgres DB is not loaded at all, 16 vCPU with 32 GB MEM
powerdns=> \timing on
Timing is on.
powerdns=> select * from domains;
id | name | master | last_check | type | notified_serial | account | options | catalog
----+------------------+--------+------------+--------+-----------------+---------+---------+---------
14 | performance.test | | | NATIVE | | | |
(1 row)
Time: 0.727 ms
powerdns=> select count(*) from records;
count
-------
56021
(1 row)
Time: 9.672 ms
I guess adding a new entry would also involve at least one insertion?
Hi Remy, I would say that at least two insertions (if we don't consider DNSsec)
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Processing started.
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: starting transaction.
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Adding record test_5858.performance.test|A
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Increasing SOA serial (2022129448 -> 2022129449)
Dec 21 12:55:21 DNS-Authority-writer-1 pdns_server[3178970]: UPDATE (57878) from 10.10.10.10 for performance.test: Update completed, 2 changed records committed.
Looking at the code https://github.com/PowerDNS/pdns/blob/master/pdns/rfc2136handler.cc I see a number of sequential database requests, it's not just two inserts - it's a lot of prep work done.
Right, so perhaps it would be useful to look at how long the insertions take on the PG side as well?
Under the load the insertion time is looking normal, see below
powerdns=> \timing on
Timing is on.
powerdns=> select count(*) from records;
count
-------
86328
(1 row)
Time: 9.795 ms
powerdns=> select * from records;
id | domain_id | name | type | content
| ttl | prio | disabled | ordername | auth
---------+-----------+-----------------------------+------+------------------------------------------------------------------------
--------------------------+------+------+----------+-----------+------
3010726 | 17 | test_1.performance.test | A | 1.1.1.1
| 3600 | 0 | f | | t
3036784 | 17 | test_13030.performance.test | A | 1.1.1.1
| 3600 | 0 | f | | t
........
Time: 89.051 ms
powerdns=> insert into records (content,ttl,prio,type,domain_id,disabled,name,ordername,auth) values ('1.1.1.1',3600,0,'A',17,false,'test',null,true);
INSERT 0 1
Time: 4.775 ms
Remi, I got your point! If one SQL insert statement takes 4.75 ms to complete than two inserts per one DNS update will take 9.5 ms that we’ve seen above.
Still, did I miss some configuration trick to make those updates in parallel?
Increasing distributor-threads
does increase the number of parallel connections to the database for regular queries, but I'm not sure it applies to UPDATE
queries which are special in many ways. In particular they need to have a consistent view of the records in the backend database to avoid races.
It also depends on whether PG can actually do concurrent insertions in your setup, of course.
My question was really about getting a list of queries and their execution times for a single UPDATE, as it happens in your environment.
The update machinery is quite involved, and maybe queries are being run that can be optimized, but its only helpful to do so starting at some baseline.
Chris, I would propose to start from architecture as Remi suggested. If you could confirm how DDNS updates are managed - in a sequence, in parallel, how we could control it - it would be much clear what we need to tune.
I would leave SQL optimization to the later stage. By the way, which tool do you use for the analysis? In Internet there are some tricks for Postgres, will try it.
Have you done internal DDNS tests? If yes, what numbers have you seen before?
Hello @v1shnya, if you run a benchmarking suite against your Postgres, what kind of number do you get for write transactions per second?
Hi Peter, Merry Christmas! used 'pg_stat_statements' extention for PostgreSQL.
Run DNSperf tool with dynamic updates
Statistics:
Updates sent: 7187
Updates completed: 7087 (98.61%)
Updates lost: 0 (0.00%)
Updates interrupted: 100 (1.39%)
Response codes: NOERROR 7087 (100.00%)
Average packet size: request 54, response 29
Run time (s): 130.096691
Updates per second: 54.474867
Average Latency (s): 1.820872 (min 0.145479, max 2.494103)
Latency StdDev (s): 0.317027
I believe that Updates per sec numbers above are lower due to increased RTT time between PowerDNS VM and PostgreSQL (ping gives me 0.5 ms today, yesterday it was 0.3 ms).
Below you could find 'pg_stat_statements' aggregated results for 7187 dns dynamic update requests
powerdns=# select query, calls, total_exec_time, min_exec_time, max_exec_time, mean_exec_time, stddev_exec_time, rows from pg_stat_statements where userid = 16385 and dbid = 16400;
-[ RECORD 1 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select domains.id, domains.name, records.content, domains.type, domains.master, domains.notified_serial, domains
.last_check, domains.account from domains LEFT JOIN records ON records.domain_id=domains.id AND records.type=$2 AND records.name=do
mains.name WHERE records.disabled=$3 OR $1
calls | 2
total_exec_time | 0.217054
min_exec_time | 0.08111399999999999
max_exec_time | 0.13594
mean_exec_time | 0.108527
stddev_exec_time | 0.027413000000000007
rows | 4
-[ RECORD 2 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select domains.id, domains.name, domains.type, domains.master, domains.last_check, records.content from domains
LEFT JOIN records ON records.domain_id=domains.id AND records.type=$1 AND records.name=domains.name where domains.type in ($2, $3)
calls | 4
total_exec_time | 0.04084
min_exec_time | 0.009063
max_exec_time | 0.011798
mean_exec_time | 0.01021
stddev_exec_time | 0.0010028975520959252
rows | 0
-[ RECORD 3 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select domains.id, domains.name, domains.type, domains.notified_serial, domains.options, domains.catalog, record
s.content from records join domains on records.domain_id=domains.id and records.name=domains.name where records.type=$1 and records
.disabled=$2 and domains.type in ($3, $4)
calls | 4
total_exec_time | 0.031763
min_exec_time | 0.00754
max_exec_time | 0.008631
mean_exec_time | 0.00794075
stddev_exec_time | 0.00044638793386470477
rows | 0
-[ RECORD 4 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | SELECT content,ttl,prio,type,domain_id,disabled::int,name,auth::int FROM records WHERE disabled=$3 and type=$1 a
nd name=$2
calls | 14374
total_exec_time | 340.84624400000087
min_exec_time | 0.012518
max_exec_time | 0.63608
mean_exec_time | 0.02371269263948803
stddev_exec_time | 0.009148093920737866
rows | 14374
-[ RECORD 5 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | SELECT content,ttl,prio,type,domain_id,disabled::int,name,auth::int FROM records WHERE disabled=$3 and name=$1 a
nd domain_id=$2
calls | 7187
total_exec_time | 6664.0139130000025
min_exec_time | 0.011294
max_exec_time | 3.363671
mean_exec_time | 0.9272316561847818
stddev_exec_time | 0.5066961052845157
rows | 7187
-[ RECORD 6 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | delete from records where domain_id=$1 and name=$2 and type=$3
calls | 7187
total_exec_time | 5551.925428000003
min_exec_time | 0.012487999999999999
max_exec_time | 2.9219649999999997
mean_exec_time | 0.7724955374982629
stddev_exec_time | 0.4098099475159046
rows | 7187
-[ RECORD 7 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select kind,content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=$1
calls | 7187
total_exec_time | 74.40643500000004
min_exec_time | 0.008514
max_exec_time | 0.331546
mean_exec_time | 0.010352919855294277
stddev_exec_time | 0.004509243279275985
rows | 0
-[ RECORD 8 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select $1
calls | 1
total_exec_time | 0.005887
min_exec_time | 0.005887
max_exec_time | 0.005887
mean_exec_time | 0.005887
stddev_exec_time | 0
rows | 1
-[ RECORD 9 ]----+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | begin
calls | 7187
total_exec_time | 5.0743099999999774
min_exec_time | 0.000379
max_exec_time | 0.010511
mean_exec_time | 0.0007060400723528597
stddev_exec_time | 0.00041882961227168264
rows | 0
-[ RECORD 10 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | insert into records (content,ttl,prio,type,domain_id,disabled,name,ordername,auth) values ($1,$2,$3,$4,$5,$6,$7,
$8,$9)
calls | 14374
total_exec_time | 1628.5856740000027
min_exec_time | 0.052046999999999996
max_exec_time | 0.7095680000000001
mean_exec_time | 0.11330079824683442
stddev_exec_time | 0.04159129781546047
rows | 14374
-[ RECORD 11 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | delete from records where domain_id=$1 and name=$2 and type is null
calls | 7187
total_exec_time | 5158.393028999998
min_exec_time | 0.006804
max_exec_time | 2.86177
mean_exec_time | 0.7177393946013654
stddev_exec_time | 0.380159914890431
rows | 0
-[ RECORD 12 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | SELECT content,ttl,prio,type,domain_id,disabled::int,name,auth::int FROM records WHERE disabled=$4 and type=$1 a
nd name=$2 and domain_id=$3
calls | 14374
total_exec_time | 14326.224386999973
min_exec_time | 0.0070929999999999995
max_exec_time | 4.965016
mean_exec_time | 0.9966762478781147
stddev_exec_time | 0.5581248219904835
rows | 0
-[ RECORD 13 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select id,name,master,last_check,notified_serial,type,options,catalog,account from domains where name=$1
calls | 7187
total_exec_time | 69.50807400000004
min_exec_time | 0.007625
max_exec_time | 0.13239
mean_exec_time | 0.009671361346876305
stddev_exec_time | 0.0026756020695306035
rows | 7187
-[ RECORD 14 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | select content from domains, domainmetadata where domainmetadata.domain_id=domains.id and name=$1 and domainmeta
data.kind=$2
calls | 21561
total_exec_time | 243.827968
min_exec_time | 0.008226
max_exec_time | 0.396304
mean_exec_time | 0.011308750429015337
stddev_exec_time | 0.004896940368968186
rows | 0
-[ RECORD 15 ]---+-----------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------
------------------------------------------
query | commit
calls | 7187
total_exec_time | 6.088857999999995
min_exec_time | 0.000368
max_exec_time | 0.009078
mean_exec_time | 0.0008472043968276024
stddev_exec_time | 0.0004659483550670446
rows | 0
If you run a Postgres (so non-DNS) benchmarking suite against your Postgres, what kind of number do you get for write transactions per second?
Postgres DB VM is dedicated to PowerDNS, I'm not using it for anything else. Looking at Postgres numbers I don't see any contention there, 'mean_exec_time', i.e. average, is well below 1 ms for all queries
I'm still waiting for Postgres benchmark numbers, to see how many write transactions your database is capable of. Until we have that number, any testing we do with pdns+postgres is meaningless.
Hi Peter!
Please, see DB baseline performance tested via standard tool pgbench -c 4 -j 5 -t 5000 -d powerdns
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 4
number of threads: 4
number of transactions per client: 5000
number of transactions actually processed: 20000/20000
latency average = 4.834 ms
tps = 827.446852 (including connections establishing)
tps = 837.155129 (excluding connections establishing)
Where in TPC-B scenario each transaction includes 1xSELECT + 3x UPDATE + 1x INSERT statements within. Details via
https://www.postgresql.org/docs/current/pgbench.html Note "What Is the “Transaction” Actually Performed in pgbench?"
ok, and with one client?
with one client getting the numbers
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 5000
number of transactions actually processed: 5000/5000
latency average = 4.060 ms
tps = 246.312160 (including connections establishing)
tps = 247.395992 (excluding connections establishing)
Thank you. So the upper bound for powerdns would be 250 or 800, depending on how well we parallelise (this I do not know). If the bound is 250, and we do 100, I am not unhappy, but this asks for some more investigation into how we do things.
Thank you, Peter! The use case is to register DHCP leases to PowerDNS via DDNS mechanism. 50 DDNS TPS is too low to be on a safe side in any reasinably large network. Here I'm talking about DDNS + regular DNS queries, in a mix I've seen some contention. Filed another request
Will PowerDNS open a new DB connection for each DDNS packet? (Like it does for every AXFR) If YES, then the DB-Connection time is problematic.
Short description
Lower than expected (yes, it's very subjective) performance of Dynamic DNC updates. Got a sustained rate of 100 updates per second when usual DNS request numbers are 7K per sec with all caching disabled and 30K per sec with caches enabled for a zone with 1M records.
Environment
Steps to reproduce
dnsperf -S 2 -u -d ~/DNSperf/DNS_RRs_updates_list.txt -s <PDNS_IP>
WhereDNS_RRs_updates_list.txt
file contains one million of update queries that looks likeExpected behaviour
As a Christmas gift I would expect a few hundreds updates per sec, that's great for enterprise environment in rush hours. Probably, multithreaded execution could be an answer here ;-)
Actual behaviour
Dnsperf and PowerDNS results are well correlated, they both show Question/answer latency ~9.5ms that gives one hundred updates per second assuming single threaded execution. Tried to change
distributor-threads
parameter value, but no effect.Please, keep in mind that ping RTT time between PowerDNS server and PostgreSQL database in my lab is nearly 0.3 ms.
Other information
Would be great to understand if something could be changed to improve DNS update rate.