Automate preloading domains for .gov (in a way that can be used for other eTLDs in the future)

lgarron commented 7 years ago

Per the Jan. 19 announcement by @konklone and @smarina04, the plan is to automatically preload all new .gov domains going forward.

The strawman idea right now is for an authoritative .gov URL to host a list of .gov domains to be preloaded. hstspreload.org would regularly check that URL and add any new .gov domains on that list. (Once I figure out how to run successful cron jobs: #35)

We have yet to figure out details like URL and format.

Also, the sites will still need to send headers to be preloaded in all browsers, unless other browsers switch to using #76 without freshness checks.

konklone commented 7 years ago

The strawman idea right now is for an authoritative .gov URL to host a list of .gov domains to be preloaded. hstspreload.org would regularly check that URL and add any new .gov domains on that list.

One other approach I think is worth discussing is whether we could potentially automate (e.g. via API) the filing of patch requests to the Chrome source code. This would have some upsides in that it wouldn't require our agency to host security-sensitive (since malicious changes could brick sites) data, and would mean you wouldn't need to add a new backend batch process.

But the downsides are that it would feed the submissions into your team's review queue and involve manual work on our part -- and you would still need some kind of authentication from our team that the patch request was from GSA. There would also need to be some sort of service built to translate the domain names into an appropriate diff and patch request, whether it was GSA-hosted or Chrome-hosted.

Tradeoffs abound!

cc @davidillsley

lgarron commented 7 years ago

and would mean you wouldn't need to add a new backend batch process.

Yeah, that would certainly be nice.

But the downsides are that it would feed the submissions into your team's review queue and involve manual work on our part

We have CLs regularly submitted (and even auto-landed) by bots. In the near future, I would hopefully be able to LGTM and land preload list JSON updates manually, with low overhead. I don't know how hard it is to set up a bot but there are people we could easily ask.

However, I'm looking into decoupling the "source of truth" from Chromium so that our release processes don't delay updates to other browsers: #76. In order to do this safely, I'm thinking about how to have a lightweight immutable-history transparency log for the state of the preload list (e.g. a GitHub repo containing a .json file); perhaps the main list can use the same mechanism to pull from a .gov list.

davidillsley commented 7 years ago

Tradeoffs abound!

Indeed. I think the tradeoffs around having a simple file which hstspreload.org could read from an authoritative URL depend on how quickly changes would be picked up, and how easily and quickly a problem could be detected and fixed. If it's still going to take weeks for the changes to apply, then I'd be less concerned about the bricking possibilities - we should be able to notice and regain control of a domain within that time.

So if we're willing to have delays and 'pending' states etc, I think there are a bunch of options.

But I suspect we all want it to speed up. So maybe in order to lower security concerns and avoid bricking, we could allow the file to specify excludes - domains which can't be included by an update to this list, but which would need to be preloaded using a header in the traditional way (possibly with an age-related expiry to this rule for consistency)?

lgarron commented 7 years ago

Eric and I are starting a doc at https://docs.google.com/document/d/1fngkzHVBRRzYKWgiKDiUrOqWDUkDBbbTXAbo4BHEAoI/edit#heading=h.4y6h6fq2j9e2

konklone commented 7 years ago

If it's still going to take weeks for the changes to apply, then I'd be less concerned about the bricking possibilities - we should be able to notice and regain control of a domain within that time.

@davidillsley In Chrome and other browsers, this is currently the case. However, there's been discussion amongst browsers about moving the preload list to be dynamically delivered through an out-of-band channel, not through full application updates. I've heard Mozilla staff discuss the possibility of deploying preload list updates within hours.

There may be a good argument to intentionally maintain some delay, to allow time for mistakes to be discovered and resolved before deployment occurs. But it's not safe to rely on weeks-long delays into the indefinite future.

prefixtitle commented 7 years ago

Another approach to the problem is for the GSA to host a internal preload list similar to official preload list format and every week Google would pull the list from a dedicated URL (e.g. hsts.gov/get/list) and get the latest registered .gov domains.

prefixtitle commented 7 years ago

It's ~5,650 -- GSA posts a copy of this list here: https://github.com/GSA/data/blob/gh-pages/dotgov-domains/current-full.csv

(Though relying on that repository for the official to-be-preloaded data file would be a significant thing -- the repository is not currently used for security-critical work.)

Around ~1,100 of those domains are used by the federal government's executive branch. (Most are state/local.) The announced .gov preloading plan covers newly issued domains (going forward) for the federal government's executive branch, and the rate of issuance in that subset probably ranges from a handful of domains per month up to maybe 20 domains a month at maximum.

At first, I thought about saying just to preload the total domain '.gov` but then I realise that its impossible. (expect if you issue all newly issued domains from subnet of the '.gov' domain.)

There will be lot of PRs every month but its manageable for now On GitHub and it seems to be the most appropriate and sensible solution we have now (to my knowledge).

What a pity its not like the UK government single domain www.gov.uk for all services and information.

lgarron commented 7 years ago

@bifurcation, @mozmark, @marumari:

@konklone and I have discussed options and considerations in this doc, but one important consideration is whether other browsers are willing to preload classes of domains regardless of whether they send an HSTS header yet (but whose inclusion is authenticated in some other way).

Assuming we mark these domains in a clear way, are there any major concerns about allowing this in the Firefox preload list filtering script?

april commented 7 years ago

I can't speak for Firefox specifically, because I'm not part of Firefox Security Engineering, so @bifurcation and @mozmark will have to comment from their perspectie.

That said, I do write tools that consume and process this data, so my personal opinion for my personal tools is that I am okay with carveouts and TLD preloads. That said, I would generally like to avoid situations like so:

.gov -> preloaded bar.gov -> carved out foo.bar.gov -> uncarved out baz.foo.bar.gov -> carved out

Even though my code currently walks up the domain tree and could handle it, I feel like it's cognitively and administratively very complicated. The document covers this, but I certainly concur that we shouldn't do fancy tricks on carveouts.

I will also add that I am mildly concerned about adding additional carveouts after a TLD is initially loaded. It requires frequent updates and/or a service and could result in leaving a site in a bricked state until the preload list catches up.

I say that specifically because there have been points in the past where the HSTS preload update process got broken. With the current system, that's not a problem (HTTPS still works, normal HSTS headers still work), but in a carveout system, it will actually result in sites not working without HTTPS. It sounds like not allowing additional carveouts is the plan, but it might be good to update the document to be clear on this point.

konklone commented 7 years ago

It sounds like not allowing additional carveouts is the plan, but it might be good to update the document to be clear on this point.

I added this to the Long Term section: "From then on, that list of carveouts would only be expected to decrease, and never increase."

And I added this at the top to make it very explicit: "In operationalizing this, the .gov eTLD is willing to have services on in-scope domains be rendered inaccessible by browsers if HTTPS is not supported by individual domains. However, the process must not risk impacting the availability of out-of-scope domains (non-executive domains, and existing domains)."

prefixtitle commented 7 years ago

@konklone @lgarron Is there already a process to include security critical domains into the preload list privately? As it maybe useful for some security critical domains.

lgarron commented 7 years ago

@konklone @lgarron Is there already a process to include security critical domains into the preload list privately? As it maybe useful for some security critical domains.

Do you mean "privately" as in "without revealing the site publicly"? No, and I don't think we should support that.

prefixtitle commented 7 years ago

Do you mean "privately" as in "without revealing the site publicly"?

Yes, I do think we should support it in very exceptional circumstances and there are lots of use cases where its appropriate not to reveal the sites domain publicly but still require static preloading and maybe public pinning in Chrome.

This will be used rarely or never but it should still be an option for security sensitive/critical domains.

lgarron commented 7 years ago

Yes, I do think we should support it in very exceptional circumstances and there are lots of use cases where its appropriate not to reveal the sites domain publicly but still require static preloading and maybe public pinning in Chrome.

I have never heard of a legitimate use case for this, and no one has ever asked for it.

I don't think we should store secret domains in a special way for Chrome, and I don't think it's appropriate to store domains in an obfuscated way (e.g. by hash) and expect other browsers to adapt their HSTS mechanism to accommodate those entries. Anyhow, this would mess up data structures, logic, and statistics based on the list. If a site wants to take up bits in browser binaries for every user, they should be willing to accept that their site is publicly listed.

lgarron commented 7 years ago

If a site wants to take up bits in browser binaries for every user, they should be willing to accept that their site is publicly listed.

(That said, if there's a site that comes to us with what they believe to be good reasons otherwise, I'm at least willing to listen.)

chromium / hstspreload.org

Automate preloading domains for .gov (in a way that can be used for other eTLDs in the future) #78