StackExchange / dnscontrol

Infrastructure as code for DNS!
https://dnscontrol.org/
MIT License
3.14k stars 400 forks source link

diff2 should order NS/MX/SRV updates more carefully #2253

Closed tlimoncelli closed 1 year ago

tlimoncelli commented 1 year ago

Some systems are strict about NS updates. They must be created after the target exists. The target can't be deleted if NS records point to them.

diff2 should order updates properly so that providers don't have to worry about this.

We should do this for NS/MX/SRV records (and anything else).

This needs integration tests.

Tom

tlimoncelli commented 1 year ago

Here's a thought:

diff2 will make this easier to implement. In pkg/diff2/analyze.go the functions mkAdd/mkChange/mkDelete could store dependency "hints". The analyzeByRecordSet/analyzeByLabel/analyzeByRecord functions could call a re-ordering function.

Note to self: These links might be useful:

blackshadev commented 1 year ago

If I have time on my hands I will try and implement this in diff2. I will keep you up to date on this issue if I start / have something meaningful.

blackshadev commented 1 year ago

I started working on this on a seperate branch there is still allot to be done though.

I observe still some challenges

tlimoncelli commented 1 year ago

Interesting points!

Wildcards: I wouldn't treat them special for now. They aren't used frequently and usually not as targets.

This is rather complex, so discussing it here before committing ideas to code will probably be beneficial.

Two strategies come to mind:

  1. Use a dependency sort library like https://github.com/stevenle/topsort to pick the order. (Suppose the record is `x IN CNAME y". mkAdd() would graph.AddEdge("x", "y") and mkDelete() would graph.AddEdge("y", "x") (only adding edges for items with targets that are hostnames). Run the toposort, and then re-order the changes based on the result. Items that aren't listed just get put at the end.

  2. Simple tagging. When creating a change, add it to one of 6 categories: first, early deletes, early inserts, everything else, late inserts, late deletes. Then user a stable sort to sort by tag.

    • I think we could write a proof that if CNAME-record deletes are always on the "late deletes" list, and if A-record adds are always on the "early inserts" list, then all CNAME deletes will order correctly. Similarly if CNAME-record adds are always on the "late inserts" and A-record adds are always on the "early inserts", then CNAME adds will always order correctly.
    • We can generalize those rules for all record types (well... at least ones with hostnames as targets).
    • This won't support chaining, but those are rare. That is, if "x CNAME y", "y CNAME z", "z A 1.2.3.4", it might not order them correctly. However that's so rare I wouldn't care about it.

I'm not saying these are the only ways to do this. I'm just brainstorming. Please share your ideas too!

Tom

PS. By the way... while we're at it... diff2.REPORT items should go to the front of the list. I think they should always be output first.

blackshadev commented 1 year ago

Wildcards: I wouldn't treat them special for now. They aren't used frequently and usually not as targets.

I want to find a generic "ordering" mechanism which works for all use cases in the future. We can still leave some cases as is for now, but at least we need to have a clear implementation path for these cases in the future. Just treating domain names as text simply won't do.

I already thought of a solution: creating a simple search tree based on the FQDN in reverse. This is how DNS resolves all parent addresses and thus should be future proof. It's implementation is pretty straight forward, lookups are a traverse and should be fast enough. And wildcards can be supported!

  1. Use a dependency sort library like https://github.com/stevenle/topsort to pick the order. (Suppose the record is `x IN CNAME y". mkAdd() would graph.AddEdge("x", "y") and mkDelete() would graph.AddEdge("y", "x") (only adding edges for items with targets that are hostnames). Run the toposort, and then re-order the changes based on the result. Items that aren't listed just get put at the end.

I opted to not use a lib and "try it myself" first because I already made topsort with kahn's algoritm and it is easy enough to do without any external dependencies. If needed, we can always add an external lib for it, but it wouldn't affect the rough implementation. And doing it by hand you will find the pain points earlier. to conclude: I am not against a library, but it is easy enough to implement on our own and I would solve all other problems first before choosing to use a library. I think you will need some custom logic which does not fit in a standard graph sorting library that easily.

  1. Simple tagging. When creating a change, add it to one of 6 categories: first, early deletes, early inserts, everything else, late inserts, late deletes. Then user a stable sort to sort by tag. I though of 2 as a "brute-force" method, which will work for most cases. But it isn't a general solution for this problem and thus I think we should aim for better.
tlimoncelli commented 1 year ago

Ok! I look forward to seeing what you come up with.

I'm intrigued by the idea of using a search tree!

tlimoncelli commented 1 year ago

We had an interesting situation today. Someone was turning an "inlined subdomain" into a delegated subdomain:

OLD:

lab1.foo.example.com A 1.1.1.1
lab2.foo.example.com A 1.1.1.1

NEW:

foo.example.com NS ns1.somewhere-else.com.
foo.example.com NS ns2.somewhere-else.com.

This required manually deleting lab1 and lab2 so that dnscontrol could create foo.

It would be nice if this case could be handled too.

blackshadev commented 1 year ago

Yeah I am aware of these situations and I am already working hard on a complete solution. The lookup through a search tree is already made (and tested), now I need to integrate the sorting (write tests for it). And have "stages" to accommodate a separation between REPORTs, DELETEs and the rest.

It will take some time, as I have only a small amount of spare time. I hope this isn't an issue.

tlimoncelli commented 1 year ago

No rush! I was actually listing that as a "stretch goal". Super cool that you were considering that case already!

blackshadev commented 1 year ago

Update from my side, code can be found here.

I opted to create a separate package for the things I needed. The sorting of DNS Records could come in handy so I didn't want to tie it to any local type from diff2. Also I thought it would be a good separation which made it easier to write small tests for it.

Still todo:

tlimoncelli commented 1 year ago

Looking good so far! I agree about making it a separate pkg. I also like that you've made the interface a simple call instructions = orderByDependencies(instructions). Cool!

I know that earlier I recommended mkAdd/kChange/mkDelete add hints. However now I think it might be cleaner to add those hints at the start of orderByDependencies(). That would make the new code even more independent. It's up to you.

One suggestion: Be careful with mkChange. The same name may appear in oldRecs and newRecs. For example if a byLabel replaces:

foo IN NS bar1
foo IN NS bar2

with

foo IN NS bar1
foo IN NS new1

That may appear in mkChange() with a .Msgs listing []string{"DELETE bar2", "ADD new1"}.

blackshadev commented 1 year ago

I like your suggestion to add the dependencies later on... I will first try the easiest way and reflect on it. Thanks for the input!

I do not fully understand your suggestion. To me this just seems like a change and should be ordered as such. If we saw it as a separate deletion and addition it should still work... but less obvious. Could you elaborate on your "one suggestion" , what do you suggest?

tlimoncelli commented 1 year ago

It's more of a warning than a suggestion :-). The warning is that the same change might include both a forward dependency and a backwards dependency. In the above example, new1 has to be created prior to the NS, but also bar2 has to be deleted after the request.

blackshadev commented 1 year ago

I still don't see how the change from foo IN NS bar -> foo IN NS new1 has a dependency. in dnscontrol code this would be a modify change and I don't see any dependencies on the change right now. Maybe I am missing something if a change cannot be executed atomically (like with TransIP where a recordset cannot be modified but needs to be added and deleted), we should always try to make it 'stable', add the new record first and remove the old one later.

Maybe I am missing something, so please tell me if I do.

tlimoncelli commented 1 year ago

I could be missing something too!

Let me rewrite the example to be more clear:

OLD:

bar1 IN A 1.2.3.4
bar2 IN A 4.5.6.7
foo IN NS bar1
foo IN NS bar2

NEW:

bar1 IN A 1.2.3.4
new1 IN A 8.9.0.1
foo IN NS bar1
foo IN NS new1

The changes would be:

  1. Delete "bar2 IN A 4.5.6.7"
  2. Change "foo IN NS bar2" to "foo IN NS new1"
  3. Add "new1 IN A 8.9.0.1"

The order would be: 3, 2, 1 (2 depends on 1, 3 depends on 2)

How does your code assure that 3 happens before 1?

The way I read the code (which I admit might be an older version of your code)... mkAdd and mkDelete wouldn't generate any dependencies because "A" records refer to IP addresses, not names. mkChange records dependencies for newRecs, which would be new1.

Tom

blackshadev commented 1 year ago

Ahh now I understand. I am not done with the actual implementation with the diff2 part. So the part you are refering to is still a WIP. I will add exactly this case to a test to ensure we solve it. I thought we could always perform deletes first, but you just provided a case where it isn't true. So I need a way to track dependencies between the delete changes en the "old" state of changes and the "add" and "new" state. Currently I am ordering on just NameFQDN and "new" state as dependencies.

Thanks for the example, it should eb solvable and than we have a generic approuch for delete, change and add .

tlimoncelli commented 1 year ago

Sounds good!

blackshadev commented 1 year ago

From a directed graph perspective we have this:

We should always resolve:

The dependencies should be considered using the oldRec for DELETE and CHANGE and newRec for ADD and CHANGE.

This does change how I should move forward. I made my sort algorithm without any direction and only the thew newRec. I will try and think about how I can fit it nicely.

graph TD;
    bar2-->CHANGE_foo;
    CHANGE_foo-->new1;
tlimoncelli commented 1 year ago

Yup! Sounds good!

blackshadev commented 1 year ago

Update time:

I just added a new sort function which is based on a graph structure we can interact with more easily. This way the sort will account for the dependency "direction" , since a record can have a dependency on another record. But a record can also be depend on, both are important when making changes.

What I still mis:

tlimoncelli commented 1 year ago

Thanks for the update!

blackshadev commented 1 year ago

Update time:

Done:

What I still need to do:

So we are close, you can already check out the code if you like, most things should be close to final. And functionally it is complete.

tlimoncelli commented 1 year ago

Fantastic! Sounds like you've covered a lot of ground!

I can't wait to get the PR link!

blackshadev commented 1 year ago

@tlimoncelli here you go: https://github.com/StackExchange/dnscontrol/pull/2419

I am looking forward to your feedback, as I modified allot I think you will have allot to say about it.

A few things I am really not sure about:

tlimoncelli commented 1 year ago

First, I want to say that this is looking really good so far. Thanks for all your work on this! I hope you're finding it to be fun!

(2nd bullet): I agree with your assessment about diffing on a recordset.

(1st bullet):

Adding .Origin is going to require a lot of extra memory if it isn't careful to always point to the same string. Maybe use a pointer to string and make sure they all point to the same place?

It is a shame that RecordConfig has no idea what the origin is. That was a design decision a long time ago. The problem is that the system literally doesn't know the origin when processing each item in a D().

I believe that the origin of everything passed to orderByDependencies() is the same. Everything should be either a FQDN (and not need to know the origin) or be a shortname of the same origin. I'm not sure if this would work but... what if you passed the origin to orderByDependencies() and it could pass it along to GetRecordsNamesForChanges() and so on.

Alternatively... what if we just guaranteed that all .targets were FQDNs? If pkg/zonerecs/zonerecords.go can models.Downcase() all the labels and targets, why can't it also turn the targets into FQDNs? Any code that touches records after that will be simplified. I've written https://github.com/StackExchange/dnscontrol/pull/2421 as a test.

tlimoncelli commented 1 year ago

P.S. https://github.com/StackExchange/dnscontrol/pull/2421 was easier than I assumed. Its now passing all tests.

blackshadev commented 1 year ago

Alternatively... what if we just guaranteed that all .targets were FQDNs

This seems like a proper way forward, if we can assume the record.target is always FQDN, than that would make it allot easier for me. And let's be honest, it should be FQDN at some point in the dnscontrol lifecycle anyway. So sooner seems easier for everybody. Do you intent to merge #2421 into main such that I can rebase on it? or do you want me to rebase onto #2421 directly?

Aside from this

Adding .Origin is going to require a lot of extra memory if it isn't careful to always point to the same string

For my own enlightenment: how is adding a field with the same string more memory usage than appending that same string to the targets? Keeping a reference seems as a less memory intensive way, I also saw origin being sent to a boat load of other functions which would benefit from having the origin (or a reference to it) in the record config.

tlimoncelli commented 1 year ago

Sounds good! I've merged the pr. .target should now be a FQDN with a "." anytime a provider sees it.

blackshadev commented 1 year ago

I have pulled you changes in (by rebasing). But it doesn't seem to work for the analyze_test.go. If I understood you correctly all record.target fields should be FQDN at some point. But from what I see, this isn't the case for analyze_test. Should the test input be changed to include FQDNs as targets. Or should the test setup be changed to call some internal which isn't yet being called?

blackshadev commented 1 year ago

I just pushed a commit which adds your new FQDN function to the tests and also ensured they are really fqdn instead of just appending the origin. I did need to rewrite some tests because they now include FQDN in the targets. And in some tests the NS record was filled with an IP, I checked and according to various sources, the NS record should always be a reference to another record.

edit: I see i need to change more: the difftargets tests are also effected it seems. edit 2: fixed difftargets tests

tlimoncelli commented 1 year ago

Sounds good! (Both the updated tests, and NS targets that are IP addresses. BTW... I should probably add a validation warning for that)

Please update https://github.com/StackExchange/dnscontrol/pull/2419 when you have a chance.

blackshadev commented 1 year ago

Sounds good! (Both the updated tests, and NS targets that are IP addresses. BTW... I should probably add a validation warning for that)

Please update #2419 when you have a chance.

Thanks, I updated the PR.

tlimoncelli commented 1 year ago

The code is in the current release. Sounds like we can close this!

Thanks for all your hard work on this!