Support S3-backed repositories

justinsb commented 9 years ago

It would be nice if it were easy to have ACI repositories in an S3 bucket. This should be simple, in that S3 can serve HTTP, except that HTTPS is a little tricky with S3.

An S3 backed website aci.example.com is reachable on http://aci.example.com, but not https://aci.example.com. It is reachable on https://s3-<region>.amazonaws.com/aci.example.com

I see two options (and we may want to do both):

Add s3 support via an s3:// schema, and add s3:// to the discovery sequence. This would also support private repositories.
Attempt to recognize sites that are hosted using S3 CNAMEs, and use the alternative (https://s3-<region>.amazonaws.com/bucket/...) method of accessing them. We could either detect the CNAME in DNS (hacky) or look for the "Server: AmazonS3" header on http (also pretty hacky, and security concerns of http).

Personally, I think the first option is much better; it feels much less hacky, it allows for private repos, and I can imagine that we make the discovery mechanism configurable in future, and this would tie in well with that.

justinsb commented 9 years ago

To be clear, the HTTP vs HTTPS problem really happens in two places: rkt trust, which will not accept http-only sites in discovery, and when we download the "latest" version (which we should probably only do over https, although currently we tolerate http)

It would also be great to allow private S3 buckets, of course!

jonboulle commented 9 years ago

@justinsb Do you want to put up a proposal for 1) so we can talk through it?

justinsb commented 9 years ago

I did a simple PoC that allows rkt trust s3://<bucket>/pubkeys.gpg just to make sure this made some sense. Lots of work still to do (particularly around credentials): https://github.com/justinsb/rkt/commit/a249a764b59e1a0f29306ba960d658b1441b5c0e

What is the right way to make a concrete proposal? Code or words or both? Where do the words go (and do you have an example of a good proposal)?

jonboulle commented 9 years ago

@justinsb I realised I didn't quite grok your point earlier about adding s3 to the discovery sequence. Are you just talking about that being one of the possible schemes returning during meta-discovery (e.g. analogous to the hdfs example we mention in the spec today), or are you talking about a different kind of discovery process?

justinsb commented 9 years ago

I don't think I myself have fully grokked the discovery process, so this may not make sense!

Certainly, it would be great if we supported s3:// urls if we find them. i.e. we should honor this:

<meta name="ac-discovery" content="example.com s3://mybucket/{name}-{version}-{os}-{arch}.{ext}">

And this should support S3 credentials for private buckets (I would suggest this should happen by default).

I think this support and rkt trust s3://<bucket> should be uncontroversial (other than feature creep).

But, the missing link that I'd like to bridge is that it would be great to be able to store a private ACI repo in a private S3 bucket, and have that work (without having to set up SSL certificates for the discovery mechanism on S3 and without having to open the bucket up publicly). I'm not sure what the best way to achieve this is. I think we want the s3:// url added to the sequence of discovery URLs.

The problem is that S3 bucket names don't require DNS control (e.g. I could register a bucket justin.coreos.com`). In theory, this doesn't matter because the trust mechanism should prevent any problems, but I can see this biting someone eventually.

Two options that might work:

maybe we could let a user add a path to their discovery mechanism, and support s3:// (and hdfs:// etc)
maybe we could have a "scoped trust" that would achieve this, so I would do rkt trust --root s3://aci.justinsb.com/pubkeys.gpg --basedir s3://aci.justinsb.com/ This would let me run any ACI, as long as it was signed by my key, but only if downloaded from s3://aci.justinsb.com. And this would implicitly extend the discovery path.

appc / spec

Support S3-backed repositories #319