chromium / hstspreload.org

:lock: Chromium's HSTS preload list submission website.
https://hstspreload.org
BSD 3-Clause "New" or "Revised" License
774 stars 91 forks source link

Introduce a canonical preload list "source of truth" separate from the Chromium repo #76

Open lgarron opened 7 years ago

lgarron commented 7 years ago

e.g. an endpoint like /api/v2/preload-list

The goal would be for all major browsers, including Chrome, to pull the current list from that URL. That way, the Chromium process doesn't introduce additional lag for new additions and removals, which can currently add up to extra months for affected changes due to mismatched release cycles.

This needs to be done carefully. @sleevi tells me that https://github.com/publicsuffix/list can't change its format without risk of breaking stuff, because they don't even know who all the consumers are.

(We could, say, require consumers of the list to register for an API key to get access to the endpoint, but that seems like a little overkill.)

prefixtitle commented 7 years ago

e.g. an endpoint like /api/v2/preload-list

Sounds good.

The goal would be for all major browsers, including Chrome, to pull the current list from that URL. That way, the Chromium process doesn't introduce additional lag for new additions and removals, which can currently add up to extra months for affected changes due to mismatched release cycles.

With this new system you could automate the process of new additions and removals from the list without manual interaction which is better in the long term as the list is only going to get larger in size. Having said that. maybe another endpoint like /api/v2/latest is necessary as having to download the whole list every time is both time consuming and wasting unnecessary resources just to get the latest new additions and removals.

(We could, say, require consumers of the list to register for an API key to get access to the endpoint, but that seems like a little overkill.)

Having consumers register for an API key to get access isn't a overkill. Its gives you a overview of who's using the API endpoint and as well having them register means you can inform the consumers of any sudden or pending changes to the API.

prefixtitle commented 7 years ago

@lgarron This is going to involved a lot of work implementing the new system. Do you want some extra help?

lgarron commented 7 years ago

Maybe; I need to come up with a roadmap first. Ping me at the end of February if this hasn't gone anywhere?

lgarron commented 7 years ago

Another idea I mentioned in #78: Maintain a GitHub repository with the preload list. This provides an auditable transparency log, and allows pull requests for special cases.

prefixtitle commented 7 years ago

Another idea I mentioned in #78: Maintain a GitHub repository with the preload list. This provides an auditable transparency log, and allows pull requests for special cases.

@lgarron If we went with this model, how would you protect the list from unauthorised tampering?

lgarron commented 7 years ago

@lgarron If we went with this model, how would you protect the list from unauthorised tampering?

@konklone and I are discussing that right now. We'd need a way to know which accounts are authorized to submit pull requests for which eTLDs.

Automated submissions would still need to go through the normal process.

prefixtitle commented 7 years ago

So this may become a private closed repo on Github?

konklone commented 7 years ago

No, it would mean that the maintainers of the repository would need to know, when a pull request is submitted, whether the submitters of the pull request are authorized to represent the eTLD whose hostnames are contained in the pull request.

prefixtitle commented 7 years ago

In addition to the above, would authorised persons hold PGP key to verify the pull request of specific eTLD?

konklone commented 7 years ago

Probably not, since PGP key management is probably not viable for many eTLD operators to do.

prefixtitle commented 7 years ago

Unauthorized tampering can come in many forms and I believe GitHub is not the most suitable place to host such critical list.

lgarron commented 7 years ago

Unauthorized tampering can come in many forms and I believe GitHub is not the most suitable place to host such critical list.

As a similar precedent, the public suffix list is hosted on GitHub. Git revisions are also authenticated, which prevents tampering with the historical log.

Trusted users are necessary anyhow, and GitHub is a fairly secure way to authenticate arbitrary users.

sleevi commented 7 years ago

The Public Suffix List simply requires the creation of a TXT record to indicate the GitHub PR.

We explicitly do not try to maintain a list of authorized user accounts, as those inevitably get stale. Instead, simple and practical demonstrations of authorization are sufficient.

This does mean more work for the domain holder, but avoids any ambiguity on authorization or commitment.

prefixtitle commented 7 years ago

The eTLD representative could effectively automate the entire process from start to end. The process would just involved adding simple random hash TXT record to the chosen domain DNS and using a generic PR template it can be included into the preload list without fuss.

konklone commented 7 years ago

The Public Suffix List simply requires the creation of a TXT record to indicate the GitHub PR.

We explicitly do not try to maintain a list of authorized user accounts, as those inevitably get stale. Instead, simple and practical demonstrations of authorization are sufficient.

That seems like a pretty reasonable approach for a one-time transaction ("include me in the PSL"). For this, at least for the time being, we'd be doing regular PRs on a hopefully fairly frequent basis with fresh batches of domains. The TXT record approach, which would involve one-off modifications to the production .gov DNS, is likely to add significant friction to such a process.

The PSL is also already a large-scale project with participation of a high number of public suffixes. For now, there's only 1 eTLD with formal expressed interest in pursuing this approach, and 1 eTLD I'm aware of with informal expressed interest, and so it may be reasonable to pursue a less easily scalable approach at first and change it later.

In addition, in the .gov case, we may (hopefully) get to the point where all new domains are included (not just executive) and so we can reduce the number of transactions by first asking to preload *.gov except X,000 legacy domains. Future transactions would be about deleting batches of legacy domains (and at least some of those could be indicated through publishing an HSTS header, since these would be existing domains).

konklone commented 7 years ago

The eTLD representative could effectively automate the entire process from start to end. The process would just involved adding simple random hash TXT record to the chosen domain DNS and using a generic PR template it can be included into the preload list without fuss.

In theory, that's definitely true. In practice, I think this would be unworkable for .gov and GSA, since DNS changes to .gov itself are managed with intense bureaucratic care, the relevant GSA program office does not have engineering capabilities in-house, and deploying a new in-house production system to automate this kind of task would require a substantial investment in compliance and authorization work.

Though the US government may be at the extreme end, I expect that a variety of eTLD operators in the world to be in a similar position. The process will have broader applicability if it can work for participants with limited automation capabilities.

konklone commented 7 years ago

Though the US government may be at the extreme end, I expect that a variety of eTLD operators in the world to be in a similar position. The process will have broader applicability if it can work for participants with limited automation capabilities.

Though I should also say, hopefully most eTLD operators will be able to take a blunter hand than .gov and take the approach described above (*.etld except for X legacy domains), which makes a number of problems go away. So it could be that an immediate-term process for .gov ends up getting discarded in the long run no matter what.

prefixtitle commented 7 years ago

How many .gov domains are registered?

lgarron commented 7 years ago

btw the bug for preloading .gov/eTLDs is #78. ;-)

I think @konklone is planning to reply to @ByJamesBurton's last comment, but keeping the rest of the discussion in #78 will keep this bug cleaner for what we need it for. :-)

konklone commented 7 years ago

How many .gov domains are currently registered?

It's ~5,650 -- GSA posts a copy of this list here: https://github.com/GSA/data/blob/gh-pages/dotgov-domains/current-full.csv

(Though relying on that repository for the official to-be-preloaded data file would be a significant thing -- the repository is not currently used for security-critical work.)

Around ~1,100 of those domains are used by the federal government's executive branch. (Most are state/local.) The announced .gov preloading plan covers newly issued domains (going forward) for the federal government's executive branch, and the rate of issuance in that subset probably ranges from a handful of domains per month up to maybe 20 domains a month at maximum.

@lgarron Moving to #78! =)

lgarron commented 7 years ago

Moving the source of truth is no longer a goal for me (as of a few months ago). It's not out of the question, but it's not necessary for anything now that the Chromium list can be updated cheaply.