dns-violations / dnsflagday

DNS flag day
https://dnsflagday.net/
146 stars 40 forks source link

Flag Day 2020: The date #139

Closed oerdnj closed 4 years ago

oerdnj commented 4 years ago

This issue serves as a public, open to all, discussion forum for what the date should be for DNS Flag Day 2020.

(I will make a summary of the discussion below...)

oerdnj commented 4 years ago

I proposed 31. October 2020 during the DNS-OARC in Austin and nobody objected. Therefore, I propose the DNS Flag Day 2020 should be 31. October 2020.

vixie commented 4 years ago

i do not and may never love calling these "flag days" which they manifestly are not, and the cost of the resulting confusion will not be zero. however, as to the date itself, i think it's exactly arbitrary enough.

mnordhoff commented 4 years ago

I'm loath to sound this reasonable, but I have concerns about 31 October 2020. That's near e-commerce companies' holiday shopping freeze period, the 51 weeks of the year when they get especially whiny and intransigent when people ask them to fix bugs.

Since buggy load balancers popular in industries like that are often the cause of DNS pain, it might be better to schedule it a few months earlier, when they might be more solicitous? (I would hate to schedule it a few months later.)

vttale commented 4 years ago

In addition to mnordhoff's entirely reasonable concern about peak e-commerce season, I have to wonder something in the opposite direction ... if we expect this to be largely inconsequential to actual operations, why delay it for a year? I'm in favour of doing it much sooner, like say in the spring. To just throw out a number for the sake of something specific: April 1.

vdukhovni commented 4 years ago

If a few weeks in October makes enough of a difference to make e-commerce sites less worried, then one of the biggest "flag days" in history was the introduction of the Gregorian calendar on 4 Oct 1582 (followed by 15 Oct 1582), the DNS "flag day" could pay homage to that date.

Or if we wanted to move it back (earlier), Britain adopted the new calendar on 2 Sep 1752 (followed by 14 Sep 1752). The Sep 2nd is a Wednesday in 2020, just as it was 1752.

vdukhovni commented 4 years ago

Another historical flag day is the introduction of the metric system in France on 30 Mar 1791. That fits with the proposals to schedule it early in the year (Northern Hemisphere spring).

vttale commented 4 years ago

With Viktor's comment I hearby amend my April 1st suggestion to March 31. That's 03/037/03744 in octal which must mean something somehow too.

franklouwers commented 4 years ago

Why not Feb 1st, as that historical day marked the first DNS Flag Day?

Unless there are very good reasons (eg: some big noncompliant vendor told us they could make October, but not February), why wait almost another year?

oerdnj commented 4 years ago

the 51 weeks of the year Did you mean days? ;)

We’ve been told that business would like to have a period closer to 1 year rather than couple of months. What about 1. October 2020 then?

vttale commented 4 years ago

We’ve been told that business would like to have a period closer to 1 year rather than couple of months. What about 1. October 2020 then?

I guess I'd like to hear more about what businesses and why. Maybe they could actually participated in this thread. Given that y'all have already been telegraphing the next "flag day" since at least -- what, late spring? -- then spring 2020 is a year too.

wtoorop commented 4 years ago

Last time results of impact studies were not available to operators before flag day. I think it would be nice if impact study results would be available before the new flag day this time. Realistically this means results will be available in spring. So, allowing operators time to process- and respond to- those results, the flag day should IMHO be in fall.

letoams commented 4 years ago

The day picked is arbitrary because there isn't actually a flag day because of the delays between upstream implementers and downstream vendors.

If you would write a BCP document for DNS implementers, then that RFC document's publication date would be what you need, AND you wouldn't need to confuse people with "flag date" as a term or confuse people about who is "DNS violations". And this information would remain available even after "dns-violations" website and github repository have perished.

pspacek commented 4 years ago

Let me point out that this actually is flag day for 1/4 of Internet user population is behind cloud-resolvers. We know that cloud-resolvers are able to roll out changes in short periods of time (as opposed to slow SW upgrades elsewhere) so the date actually matters.

For that reason we need to coordinate with cloud-resolver operators, let's see what they can tell us.

letoams commented 4 years ago

On Nov 18, 2019, at 15:47, Petr Špaček notifications@github.com wrote:

Let me point out that this actually is flag day for 1/4 of Internet user population is behind cloud-resolvers.

Either you will break 1/4 of the Internet and then you shouldn’t or you don’t break 1/4 of the Internet and then it is not a flag day? For that reason we need to coordinate with cloud-resolver operators, let's see what they can tell us.

If you can coordinate that operators shouldn’t run broken / old software, you can tell them to upgrade at any time? If you cannot reach these operators than you are going to break things and I don’t see how synchronizing upstream vendors does anything for downstream operators actually running DNS. It will still be out of sync across the world and there would still be a last-movers advantage.

Annual flag days are a hype that’s detrimental to the DNS community.

Paul

vcunat commented 4 years ago

I believe the main point of coordinated date is to lower the first mover's disadvantage like

My site is broken with your resolver but works with (almost) everyone else's, so it's "obviously" your fault.

Last movers might still have some shorter-term advantage, but I don't think that really matters, as we should now have enough critical mass to force fixing TCP in basically all cases that care to keep running (if I simplify it). Having a doomsdate also helps with marketing, i.e. making them fix it in advance and dismissing complaints that they couldn't have known this would come.

Either you will break 1/4 of the Internet and then you shouldn’t or you don’t break 1/4 of the Internet and then it is not a flag day?

Depends on your point of view. If you're a user, almost nothing should break. I hear that many million users already are behind a post-flag resolver config. If you're a badly setup service, you will experience problems from a large fraction of internet... but if you get warned in advance, are given testing tools, etc.

vcunat commented 4 years ago

So... are we waiting for something in particular? I've heard no real objections against the beginning of October so far. If some claim they need to know long in advance, we shouldn't take too long to set the date.

oerdnj commented 4 years ago
  1. October 2020 seems like a good date then. Let's settle on that.

@Habbie @bjovereinder @wtoorop @pspacek @ralphdolmans ok with that?

amsowellman commented 4 years ago

Greetings! I wanted to check in regarding the recent world events. Since the recent COVID-19 pandemic has led to global unrest, should the next DNS Flag Day (i.e. DNS Flag Day 2020) be moved out in time to accommodate? Enforcing stricter rules during a period of unrest could result in more pain and counter productivity.

jelu commented 4 years ago

@amsowellman

Enforcing stricter rules...

I don't think you understand what we are trying to do... it's about removing workarounds and making things better by, some times, actually following the rules.

puneetsood commented 4 years ago

Hi Ondrej,

Is October 2020 official now? And is it the beginning or end of Oct?

pspacek commented 4 years ago

Hi @puneetsood! Thanks for asking. It was meant as October 1st 2020 aka 2020-10-01.

As far as I can see it received positive response from ISC, CZ.NIC, NLnet Labs and PowerDNS as well, so now Google et. al are missing in the list.

(Github auto-formatting changed the obsolete pre-ISO https://en.wikipedia.org/wiki/Date_and_time_notation_in_Europe into a numbered list with a single item in it.)

SvenVD-be commented 4 years ago

It is still unclear to me if 2020-10-01 is now the official date or that we are waiting for Google et. al to confirm this date?

oerdnj commented 4 years ago

Ok, let's go with 2020-10-01 officially. The impact of this DNSFlagDay will be minimal anyway, removing the EDNS0 was much bigger deal and it was handled gracefully.

vixie commented 4 years ago

On Thursday, 16 July 2020 12:25:35 UTC Ondřej Surý wrote:

Ok, let's go with 2020-10-01 officially. The impact of this DNSFlagDay will be minimal anyway, removing the EDNS0 was much bigger deal and it was handled gracefully.

i'd like to discuss the change, which doesn't qualify as a "flag day", but could (avoid-fragmentation draft). curtailing message size via hardcoding has never ended well.

-- Paul

puneetsood commented 4 years ago

Hi @puneetsood! Thanks for asking. It was meant as October 1st 2020 aka 2020-10-01.

As far as I can see it received positive response from ISC, CZ.NIC, NLnet Labs and PowerDNS as well, so now Google et. al are missing in the list.

(Github auto-formatting changed the obsolete pre-ISO https://en.wikipedia.org/wiki/Date_and_time_notation_in_Europe into a numbered list with a single item in it.)

We are doing some experimentation to quantify impact for our users. Will have an update in 2 weeks.

pspacek commented 4 years ago

Thank's great news, thank you.

FYI I did quick scan over CZ TLD and number of domains which work over UDP and do not support TCP at all around 0.05 %. This methodology does not take into account that some domains which do not support TCP will never send big enough answer to require TCP, i.e. it is upper bound.

[Unfortunatelly I'm swamped so I do not have enough time to optimize the code for com. zone so I do not see world-wide results.]

dkg commented 4 years ago

@puneetsood wrote:

We are doing some experimentation to quantify impact for our users. Will have an update in 2 weeks.

@puneetsood, any results from these experiments?

pspacek commented 4 years ago

FYI recent research results about EDNS buffer size values were presented at DNS-OARC 32b: https://indico.dns-oarc.net/event/36/contributions/776/

Keep in mind this is an academic research. Practical implementation will have to take into account additional complexity from real-world, e.g. that resolver does not have information if the "other end" of communication lies in the same network or on the other side of the "other side" of the Internet etc.

puneetsood commented 4 years ago

@puneetsood wrote:

We are doing some experimentation to quantify impact for our users. Will have an update in 2 weeks.

@puneetsood, any results from these experiments?

Our experiments show that the change will work well for our users.

Summary of our plan below. I will be posting a similar message to the dns-operations@ list today.

We plan to deploy this change incrementally over a period of 4-6 weeks starting on the flag day. We will start with a low percentage of queries using the new value for EDNS0 bufsize on the flag day and increase to 100% coverage by the end of the period. In case of significant problems, we will pause or rollback the changes and communicate this to the community.

This behavior has been working well for our users. We do not plan to make any changes on the client side.

Baseline queries with bufsize = 4096 Queries with UDP truncation: 0.345% Queries with TCP retry failure: 0.115% With bufsize = 1232 Queries with UDP truncation: 0.367% Queries with TCP retry failure: 0.116%

Baseline queries with bufsize = 4096 Queries with UDP truncation: 0.238% Queries with TCP retry failure: 0.072% With bufsize = 1400 Queries with UDP truncation: 0.259% Queries with TCP retry failure: 0.071%

pspacek commented 4 years ago

If I read the table above correctly dropping EDNS buffer size down to 1232 adds additional fallback to TCP for roughly 0.022 %. That seems negligible to me. Out of curiosity, do you have some error bounds for these measurements?

Also the increased TCP failure rate roughly 0.001 % seems almost like measurement error.

Thank you very much for these measurements, it is much appreciated.

puneetsood commented 4 years ago

If I read the table above correctly dropping EDNS buffer size down to 1232 adds additional fallback to TCP for roughly 0.022 %. That seems negligible to me. Out of curiosity, do you have some error bounds for these measurements?

Also the increased TCP failure rate roughly 0.001 % seems almost like measurement error.

There is variability across different metros and we do see a decrease in the truncation and TCP retry failure rate in some metros. The TCP failure rate with the experiment ranges between 0.0046% and 0.0104%.

                                                                        Min |        Max |   |           Average
                                                                         All  |           All |   | 1232        | 1400         | All

Truncation increase, in experiment: -0.0474% | 0.0628% |   | 0.0330% | 0.0123% | 0.0218% TCP retry failure increase, in experiment: -0.0354% | 0.0009% |   | -0.0014% | -0.0065% | -0.0042%

Thank you very much for these measurements, it is much appreciated.

pspacek commented 4 years ago

Apparently we forgot to close the issue even though the date is set, let me fix this mistake!