desec-io / desec-stack

Backbone of the deSEC Free Secure DNS Hosting Service
https://desec.io/
MIT License
375 stars 48 forks source link

Support RFC2136/TSIG #357

Open renne opened 4 years ago

renne commented 4 years ago

The API is nice but needs programming skills to implement deSEC in custom infrastructure. It is of no use for John Does. RFC2136/TSIG on the other hand is common. It would even be possible to interface the ISC DHCPd with deSEC. CertBot also supports RFC2136/TSIG.

peterthomassen commented 4 years ago

Nice idea! Unfortunately, it cannot be implemented in a straightforward fashion in our architecture ...

We currently do not expose any nameserver to the public (except on the slaves). As a consequence, we would either have to do that -- to have our signing server accept DNS updates directly --, or add some sort of RFC2136 parser to our API. The first approach comes with significant implications for security (currently, the nameserver has no notion of users, for example, and we also would like to avoid having anything connect to our signing server), while for the second approach, I'm not sure what size of a project that would be in Python. Taking a quick look around, I was not able to find an implementation.

I am leaning towards the second approach, if any. Do you have more insights in how it could be practically done?

andrewtj commented 4 years ago

Did you take a look at PowerDNS's forward-update support? It sounds like it may be suitable for forwarding updates from the frontend servers to whatever ends up handling the update messages themselves.

I'm not sure if PowerDNS's RFC 2136 implementation is standards compliant (it probably is but I know of a couple of broken implementations out there), or whether it's update-policy support is suitable for limiting client privileges in line with #347.

If PowerDNS's implementation isn't suitable, it might make sense to extend the HTTP API to support everything RFC 2136 does, and then either let the community fill the gap, or build an RFC 2136 to HTTP API proxy. That probably sounds a little convoluted but I think there's some value in having only one interface for mutating zones. I've only glanced at the API but I think prerequisites is the only thing missing from being able to implement RFC 2136 on top of the HTTP API.

dnspython is pretty well regarded for building custom DNS software in Python if you choose to go down that path. I'd also be willing to write an RFC 2136 to HTTP API proxy in Rust if the HTTP API gets to a point where it's capable of everything RFC 2136 can do.

nils-wisiol commented 4 years ago

Zone import via AXFR is something I had on my radar for longer already, I created #373 for that. I am currently working on https://github.com/desec-io/desec-stack/projects/2#card-37221829, maybe it's easy to fit it in there.

Exposing our signing server to the Internet I would consider a no-go. However, the "proxy" approach you suggested may be viable; we did a similar thing (on a much smaller scale) when implementing the dynDNS-Interface.

So, let's postpone this until #373 comes around.

If you are dying to code something, maybe a TSIG authentication handler in the api would be a good starting point?

renne commented 4 years ago

I agree exposing the signing server is a no-go. It should be as encapsulated as possible.

The easiest way is to use the forward option in PowerDNS to forward RFC2136 requests to the signing hidden primary nameserver.

The question is whether your database scheme is compatible (custom changes, views, triggers)?

10 years ago I used RFC2136 in PowerDNS with ISC-DHCPd and nsupdate with a CGI Bash-script to mimic a DynDNS API.

It worked fine but I do not know the current state in PowerDNS.

peterthomassen commented 4 years ago

Unfortunately, it's a bit more complicated: Neither are the slave pdns servers exposed on the frontend machines (they are "presided" by a dnsdist instance of which I'm not sure whether it can also forward updates), and then the slave pdns instances can only see our "nsmaster" pdns instance through the VPN, which functions a slave instance to "nslord", which is the signing server.

There are two reasons for this: 1.) "nslord" signs on outgoing AXFR, and this way it only has to sign once for the AXFR to "nsmaster" (and not 15 times for the 15 frontends); 2.) slave compromise does not connect to the signing server.

As a consequence, we would need dnsupdate forwarding over multiple layers. Also, I'm not sure if we want to reinforce that sort of vendor login. Under these considerations, an RFC 2136 to HTTP API proxy seems more convincing to me.

nils-wisiol commented 4 years ago

Applying changes to the DNS without API interaction causes problems such when we try to apply rate-limiting, maintaining atomicity, retrieving records (imagine they changed on the name server, but not on the API) and so on. For these reasons, I agree with @peterthomassen, the API shall remain(*) single source of truth for record contents, side-lining the API for RFC 2136 is not an option.

(*) For the sake of completeness, as things are now, it is not for DNSSEC-related record contents. This behaviour may change in the future though.

renne commented 4 years ago

According to this issue dnsdist can forward RFC2136 updates.

How is the API connected to nslord? Is there any diagram showing the relations between machines, services, databases and protocols/APIs?

peterthomassen commented 4 years ago

Here's a simplified diagram of our architecture. Some components are missing (VPN server, monitoring components, Celery task runners etc.). The replication mechanism duplicates the "Public DNS database" from the left part of the diagram to the right part; the mechanism itself is not part of the diagram. (It's not just database replication, but essentially DNS AXFRs with a controlling service that identifies missed updates, as well as newly created and deleted zones. The latter can't be done with AXFR alone.) Also, we have a dnsdist instance running in front of the "Frontend Server".

image

What you can't see here is that "Frontend Server" and "Signed Server" can see each other through the replication VPN, so with dnsdist supporting forwarding of RFC2136 updates, such updates could, eventually, arrive at the "Signing Server".

However, as Nils said, this path would not be subject to the rate limiting implemented in the API and the Gateway. More critically, we keep a copy of unsigned DNS content in the "API Database" to reduce the interaction surface with the signing server. For example, when you simply want to GET your rrsets/ via the API, you will get them from "API Database", and no connection is made to the "Signing Server". Initially, we did not have this duplication of data, but we encountered several issues that necessitated that architectural change.

As a consequence, DNS updates which enter a "Frontend Server" via RFC2136 and then finally arrive at the "Signing Server" via forwarding, instead of coming into the API, would go unnoticed as far as the "API Database" is concerned. This inconsistency demands a solution.

Our impression is that the most convincing solution would be one where all input goes through the API (potentially piped through an RFC2136 proxy). This way, responsibilities would be assigned very clearly, and we would reduce entanglement between components. Imagine, for example, a situation where we would want to replace the "Signing Server" with some other software for some reason. This will be much easier to do if data flows are simple and requirements for each components are kept minimal, including that we don't require the "Signing Server" to handle RFC 2136 updates.

Does that make sense?

andrewtj commented 4 years ago

That seems reasonable. To correctly translate from RFC 2136 to the bulk operation interface it will be necessary to retrieve some or all of a zones content. So if #372 results in a way to fail retrieval and bulk operations if a zone changes after the first retrieval, there should be enough to work with to create a standards compliant proxy.

MrRulf commented 1 year ago

Hi, I saw this feature made it onto the todo, but that has been almost three years ago so I wanted to ask did something happen in the meantime? A workaround or something that makes the feature irrelevant? Or did you find a reason not to implement it because it's "not compatible"? Or is it just not important?

nils-wisiol commented 1 year ago

I think its doable and a nice feature, but as it requires a larger engineering effort we just currently do not have resources to implement it.

MrRulf commented 1 year ago

I understand, are there any ETAs or something similar on when this might be implemented? The issue has not seen any changes in 2,5 years so it would be nice to know if something similar can be expected for the next 2,5 years or if this will be worked on.

With this issue #707 would also be solve, since no more webhooks would be needed for cert-manager.

nils-wisiol commented 1 year ago

There is currently no ETA and no work planned on this. But, we are welcoming contributions by the community and support in efforts addresses this (and other) issues.

MrRulf commented 1 year ago

I'll try to look into this in the next weeks, but I'm no expert in this and more feeling like I'm barely scratching the surface. So I wouldn't expect any PRs from me any time soon, but if I find a way to help on this in a meaningful way, I'll try.