PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.7k stars 909 forks source link

Records duplication when changing them concurrently via web API #7128

Open pracj3am opened 6 years ago

pracj3am commented 6 years ago

Short description

Concurrent web API requests PATCH /servers/{server_id}/zones/{zone_id}, results in records duplication. Both updated record and SOA record is duplicated.

Environment

Steps to reproduce

  1. pdns authoritative server with web API enabled and gpgsql module enabled, otherwise the default pdns configuration.
  2. postgresql server runs on the same host machine
  3. for i in {1..5}; do
    curl -X PATCH "http://127.1/api/v1/servers/localhost/zones/test.com" \
    -H "accept: application/json" -H "X-API-Key: secret" -H "Content-Type: application/json" \
    -d "{ \"rrsets\": [ { \"name\": \"a.test.com.\", \"type\": \"CNAME\", \"ttl\": 3600, \"changetype\": \"REPLACE\", \"records\": [ { \"content\": \"test.com.\", \"disabled\": false, \"set-ptr\": false} ] } ] }" &
    done

Expected behaviour

Expected results of postresql queries

pdns=# select type, content from records where domain_id = 1 and name = 'a.test.com' and type='CNAME';
 type  |  content   
-------+-----------
 CNAME | test.com
(1 rows)
pdns=# select type, content from records where domain_id = 1 and type='SOA';
type |                          content                          
-----+-----------------------------------------------------------
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
(1 rows)

Actual behaviour

pdns=# select type, content from records where domain_id = 1 and name = 'a.test.com' and type='CNAME';
 type  |  content   
-------+-----------
 CNAME | test.com
 CNAME | test.com
 CNAME | test.com
 CNAME | test.com
(4 rows)
pdns=# select type, content from records where domain_id = 1 and type='SOA';
type |                          content                          
-----+-----------------------------------------------------------
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
 SOA | ns.test.com admin.test.com 2018103164 10800 180 604800 60
(4 rows)

Other information

There should be a transaction in PacketHandler::performUpdate

BEGIN;
SELECT * FROM records WHERE type='CNAME' AND name='a.test.com' AND domain_id=1;
DELETE FROM records WHERE type='CNAME' AND name='a.test.com' AND domain_id=1;
INSERT ...;
COMMIT;

with repeatable read isolation level, but I found no transaction in PacketHandler::performUpdate or GSQLBackend::replaceRRSet.

pieterlexis commented 6 years ago

Thanks for this, do you have the possibility/chance to test this in 4.1?

pracj3am commented 6 years ago

Yes, I am able to reproduce it also with version 4.1.4-1.

rgacogne commented 6 years ago

I found no transaction in PacketHandler::performUpdate or GSQLBackend::replaceRRSet

In case anyone else is wondering, PacketHandler::performUpdate() is called from PacketHandler::processUpdate() after a transaction has been started and before it's committed or rollback-ed.

rgacogne commented 6 years ago

The patchZone() function also starts a transaction before doing anything else, weird..

pracj3am commented 6 years ago

For PostgeSQL you need to set repeatable read isolation level for this transaction, otherwise the delete queries are just ignored, if a concurrent transaction deletes the same rows.

zeha commented 5 years ago

Note this important bit on repeatable read: "Applications using this level must be prepared to retry transactions due to serialization failures"

Habbie commented 5 years ago

4.2.0-rc3 has much more correct transaction handling. Can you try to reproduce your issue on it?

Habbie commented 5 years ago

Closing this in favour of #8374 which seems identical and has more recent activity.

pracj3am commented 5 years ago

Sorry for the late reply. I am still able to reproduce it on 4.2.0-rc3 with postgres 11.

zeha commented 5 years ago

So, we're clearly not using enough "locking" here, even with startTransaction etc. As said by @pracj3am the gpgsqlbackend could try with repeatable read; gmysqlbackend should maybe take a lock on the domain record (not sure about the exact semantics); gsqlite might be implicitly okay by sqlite's write rules?

klaus3000 commented 5 years ago

Wouldn't it be better to find a solution whick works with all backends? I.e. having the locking inside PDNS? Ie if there is a modification on a zone's RRs, create a shared lock based on zone+label+type, then delete/insert, then unlock. I think there are already plenty of locks in PDNS itself.

Habbie commented 5 years ago

Wouldn't it be better to find a solution whick works with all backends? I.e. having the locking inside PDNS?

That doesn't help pdns_server vs. pdnsutil, or two pdns_server instances talking to the same database, etc.

pracj3am commented 5 years ago

The following dockerfile reproduces the issue.

FROM debian:buster

RUN apt update && apt upgrade -y && apt install --no-install-recommends -y \
        ca-certificates \
        git \
        libpq-dev \
        build-essential \
        postgresql-11 \
        autoconf \
        automake \
        libtool \
        pkg-config \
        ragel \
        bison \
        flex \
        virtualenv \
        libboost-all-dev \
        libssl-dev \
        curl \
        jq \
        && apt clean && rm -rf /var/lib/apt/lists/*

RUN git clone -b rel/auth-4.2.x https://github.com/PowerDNS/pdns && \
    cd pdns && \
    autoreconf -vi && \
    ./configure --with-modules=gpgsql --without-lua --disable-lua-records && \
    make && make install

RUN \
    echo '\n\
api=yes\n\
api-key=1\n\
webserver=yes\n\
webserver-address=0.0.0.0\n\
webserver-port=80\n\
launch=gpgsql\n\
gpgsql-host=127.0.0.1\n\
gpgsql-dbname=dns\n\
gpgsql-user=pdns\n\
gpgsql-password=2\n' \
 >/usr/local/etc/pdns.conf 

RUN \
    echo 'curl -s -X PATCH "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" -d "{ \"rrsets\": [ { \"name\": \"a.example.com.\", \"type\": \"CNAME\", \"ttl\": 3600, \"changetype\": \"REPLACE\", \"records\": [ { \"content\": \"example.com.\", \"disabled\": false, \"set-ptr\": false} ] } ] }"' >/patch && \
    chmod +x /patch && \
    echo 'curl -s "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" | jq ".rrsets | map(select(.name==\"a.example.com.\"))[0].records"' >/test && \
    chmod +x /test && \
    echo 'curl -s "http://127.1/api/v1/servers/localhost/zones/example.com" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" | jq ".rrsets | map(select(.type==\"SOA\"))[0].records"' >/test2 && \
    chmod +x /test2

RUN \
    /etc/init.d/postgresql start && \
    su postgres -c 'psql <<<"create database dns"' && \
    su postgres -c "psql <<<\"create role pdns with login password '2'\"" && \
    su postgres -c "psql dns </pdns/modules/gpgsqlbackend/schema.pgsql.sql" && \
    su postgres -c "psql dns <<<\"grant all on all tables in schema public to pdns\"" && \
    su postgres -c "psql dns <<<\"grant all on all sequences in schema public to pdns\"" && \
    /usr/local/sbin/pdns_server --daemon && \
    curl -s -X POST "http://127.1/api/v1/servers/localhost/zones" -H "accept: application/json" -H "X-API-Key: 1" -H "Content-Type: application/json" -d '{ "name": "example.com.", "kind": "Native", "nameservers": ["ns.example.com."]}' && \
    /patch && \
    pkill pdns_server && \
    /etc/init.d/postgresql stop

ENTRYPOINT /etc/init.d/postgresql start && \
    /usr/local/sbin/pdns_server >/var/log/pdns.log 2>&1 & sleep 5 && \ 
    /bin/bash -c "for i in {1..5}; do for j in {1..10}; do /patch & done; sleep 1; done" && \
    /test && /test2