docker-archive / docker-registry

This is **DEPRECATED**! Please go to https://github.com/docker/distribution
Apache License 2.0
2.88k stars 877 forks source link

Registry next generation #612

Closed dmp42 closed 9 years ago

dmp42 commented 9 years ago

Dear community (on top of the head: @wking @bacongobbler @noxiouz @ncdc @proppy @vbatts and many others - also @shin- @samalba @jlhawn @dmcgowan ),

In a shell

Work is starting to design an entirely new "registry" - meaning new storage driver API, new image format, new http API, new architecture (eg: relation to other services), new docker engine code, and finally new technology for the service.

If you haven't seen it yet, there is a proposal for the new image format allowing "signing" there: https://github.com/docker/docker/issues/8093 that triggered and fuels this desire for change.

Reading it will give you a hint on the envisioned new image format from an engine perspective.

Below, I'll try to cover all bases in a Q&A fashion. Please comment if you have more questions. If you have ideas and suggestions, you can open tickets with a title like "NG: Fantastic Idea".

Holy c! What will happen to the registry as we know it?

As a part of the docker infrastructure, the existing "V1" registry will continue to be used on production servers for the foreseeable future, delivering V1 images to V1-only docker engines (< 1.4) and "both-ways" engines (>=1.4,<2?). It might eventually be replaced by a V2 registry with a reimplementation of V1 endpoints, but that remains to be seen, since that would be a rather dull task.

As an open-source project, I'll continue to steward it and will merge interesting work and fixes from the community, and we will continue to provide security releases if need be, but it's unlikely major new features or changes will happen.

I feel that it has now reached its full "maturity" (for better or worse), and that the new extension mechanism we merged in 0.9 opens room enough to everyone to keep doing interesting things with it while the core of it will enter "maintenance" mode.

Thus registry 1.0 will be the last (and final, IMO) major release of "V1", that will likely be maintained (like I said) for at least a full year.

You said, "new technology"?

Yes. The new registry will be developed in go instead of python.

The reasons for that are:

Starting from scratch sure has its downsides, and I can't say I'm happy ditching the accumulated experience with V1/python (especially all the good work done on drivers), but in the end it's a reasonned choice, and I believe the benefits out-weight the downsides.

Why oh why change? ... the storage format

We want image signing capability. We believe we can't have it without an image format change (content addressable ids for a start).

Furthermore, the current storage format has terrible shortcomings:

The new image format drastically simplifies the concepts:

Exit "ancestry" (now implicit from the order of layers inside the image "manifest"). Exit "layers are images are layers". Exit "layer json" etc.

Backward compatibility is a requirement, so, it's likely the V2 registry will be able to "generate" V1 content as well on the backend storage. Generating V2 content from V1 datastores should also be possible (might be provided by third-party scripts).

Why oh why change? ... the rest API

The current API ties to the format, and shares most of its defects (awkward, needlessly complex, not "static-able").

Also, the authentication model and relation to other bricks I consider "broken" (given how difficult it is to use/implement for most people).

The new API will be much simpler, with only a couple endpoints.

GET/PUT image manifest

PUT link layer into image

PUT layer

GET layer from image

GET list tags

And the GET part will make it super-easy to deliver the payload through a simple "static" http server.

We hence expect cache mirroring (for example) to be much more simple.

As far as authentication is concerned, the plan should be standardizing on JWT/OAuth.

Why oh why change? ... the technology - I mean, man, that really sucks, python is so cool and I barely started understanding the codebase

Change is good, man.

New things, new adventures! Be a part of it!

Why oh why change? ... the drivers API

The drivers API was never really "designed".

There was an initial interface that eventually grew organically, then was then ripped out of the registry to provide a basis for third-party drivers implementors.

It does bear the scars of its history (eg: it's butt-ugly for one thing).

The new interface will likely be way more concise and clean.

What I can think of for now is something like:

write_stream
read_stream
put
get
list
mv

Given go nature, we need to figure out what's the best way to make drivers standalone (eg: without the need to recompile the registry to use a new one).

Also, I definitely want push-resume support in there (S3 does support that, though we don't exploit it right now).

These are the two challenges that face us.

Any other cool ideas on the driver side of things, please jump in (thinking specifically about you @noxiouz and @bacongobbler).

New extensions model?

It took us a year to finally come-up with a decent extension mechanism for the V1 registry (on top of signals).

I strongly believe that good extensibility is what will make the new registry cool, and I would love to have it, well thought, from the very first version of registry V2.

Again, given go nature, we can't have dynamically (runtime) loaded standalone extensions, so, we need to figure out something also there.

HTTP based communication is fine by me (in a micro-services world), and also elegantly solve scalability and delegation problems.

Here as well, ideas are welcome :).

Do you say the previous registry was just crap entirely?

No. It did serve its purpose well, parts of it are really cool, I enjoyed stewarding it a lot, and I really think the most awesome part of it is the nascent community around it.

Now, it's not ready for the future, which is why we need to move on.

Wait! You have it all figured out?

No, not yet.

The vision is there.

We know the shortcomings.

And we did all the errors.

But it remains to be designed and built, and I want this process to happen with the community, capitalizing on the good vibe we had these past months.

So, how does this work?

I'll start a V2 (or next-gen) branch soon, so that development happens in the open and PRs can be merged, and will bring in more manpower to contribute the "foundations" (research is going on for S3 and filesystem drivers).

The plan is to figure out ASAP:

so that we can move on the actual implementation and let the community get crazy with extensions.

Also, if you have desires, wishes, ideas, please submit a ticket here, starting with "NG: " in the title. I don't think we need this to be too formal to start with - so let see how this goes.

If you want to be more involved than that, then you can definitely help with answering / triaging said tickets, or go ahead with fully-fleshed proposals and PRs (proposals can be PRs themselves I guess? do we need to be formal on that?).

Thanks again community, for it has been a very good journey so far, and I'm confident the next one will be even more awesome!

wking commented 9 years ago

On Tue, Oct 07, 2014 at 12:14:09PM -0700, Olivier Gambier wrote:

Yes. The new registry will be developed in go instead of python.

The reasons for that are:

  • reduce the "gap" inside the community and build on a common technology, using common libraries (libtrust and @dmcgowan, I'm looking at you)
  • thus easing integration test with the rest of the platform, etc

I think this would make more sense if there was going to be more sharing of code between the registry and the daemon/client. However, I don't think we need any brains in the registry, since I see provanance as a contract between the builder and signer, and completely separate from the registry [1,2].

  • start with a clean slate

This is a benefit?

If this means we get transactional backends for free, then great :). Otherwise, I think the current implementation scales well (just add as many threads as you need), since there's no need to communicate between threads.

  • while python is a robust, mature and well established technology (stack), it really starts smelling funny in a number of places - some young blood / fresh air will do us all good :)

Where are the funny smells?

noxiouz commented 9 years ago

HTTP based communication is fine by me (in a micro-services world), and also elegantly solve scalability and delegation problems.

What about some kind of a binary protocol with multiplexing of read/writes streams? Draft of HTTP/2 looks good as a concept. We can take a look at some common binary serialization libraries (for example msgpack) and use one of them to communicate between core and plugins over tcp/unix domain socket. It allows us to implement a fast, flexible, easy-to-extend protocol. This protocol should be bidirectional to provide a full control over communication.

dmp42 commented 9 years ago

I think this would make more sense if there was going to be more sharing of code between the registry and the daemon/client. However, I don't think we need any brains in the registry, since I see provanance as a contract between the builder and signer, and completely separate from the registry [1,2].

Tarsum verification will have to occur also on the registry side. And I would expect the registry to verify images signatures as well.

These are the area of "shared" code I'm thinking about (so, libtrust, some bits for tarsum, and probably some other "engine" code related to manipulation of the image format).

Common tooling is a plus as well. Common development guidelines, etc.

This is a benefit?

I do believe there is benefit there - yes, I know http://www.joelonsoftware.com/articles/fog0000000069.html

If this means we get transactional backends for free, then great :).

Why not? :)

Otherwise, I think the current implementation scales well (just add as many threads as you need)

It does scale.

Now, what about things breaking in not so subtle ways because of libevent minor version differences?

Or the need to call "magical" monkey patching "before" any other code, that doesn't always seem to fully do the job?

since there's no need to communicate between threads.

There is a need to communicate between threads, right now, or be bitten by eventual consistency (which is one of the reasons we use redis for that). But then one could argue this will disappear with the new drivers.

Where are the funny smells?

  • namespaces
  • packaging
  • gevent

Other things are rather a matter of taste - don't get me wrong on this though, I do like python.

wking commented 9 years ago

On Tue, Oct 07, 2014 at 02:16:58PM -0700, Olivier Gambier wrote:

I think this would make more sense if there was going to be more sharing of code between the registry and the daemon/client. However, I don't think we need any brains in the registry, since I see provanance as a contract between the builder and signer, and completely separate from the registry [1,2].

Tarsum verification will have to occur also on the registry side. And I would expect the registry to verify images signatures as well.

Why? It's going to have to happen in local Docker daemons after downloads, so I don't see much benefit in checking on the registry side too. And with clients doing verification, I doubt anyone will bother uploading broken signatures to the registry.

Common tooling is a plus as well. Common development guidelines, etc.

This makes sense, although I don't really see the need for much more development since the existing registry code is fairly stable.

Otherwise, I think the current implementation scales well (just add as many threads as you need)

It does scale.

Now, what about things breaking in not so subtle ways because of libevent minor version differences?

Or the need to call "magical" monkey patching "before" any other code, that doesn't always seem to fully do the job?

Can you links to the issues where these came up? gevent is not my favorite package, but I'd probably just pick a different Gunicorn worker (e.g. gaiohttp 1) instead of rewriting this whole project from scratch ;).

since there's no need to communicate between threads.

There is a need to communicate between threads, right now, or be bitten by eventual consistency (which is one of the reasons we use redis for that). But then one could argue this will disappear with the new drivers.

“But when we rewrite it, we'll do a better job” is less convincing to me than “but when we use $TOOL, $PROBLEM will no longer be an issue because of $FEATURE [$LINK]” ;).

Where are the funny smells?

  • namespaces
  • packaging
  • gevent

I'm happy to drop our current Setuptools stuff (and gevent, see above) ;). What sort of packaging problems are you talking about? I'm still not sold on the whole docker-registry-core pull-out. Anyhow, I think these are things best solved incrementally.

dmp42 commented 9 years ago

And with clients doing verification, I doubt anyone will bother uploading broken signatures to the registry.

They will (broken tarsums).

And that will result in a DOS, at best (content-addressability comes at a cost).

Can you links to the issues where these came up? gevent is not my favorite package, but I'd probably just pick a different Gunicorn worker (e.g. gaiohttp [1]) instead of rewriting this whole project from scratch ;).

The most baffling ones are here:

Now, I'm not stating that go is a magic bug-free shiny new thing and solves everything - barely stating that concurrency is not a python core feature and that I expect a better situation on that front.

Finally, we are not rewriting from scratch because "X is so superior Y" - we are rewriting from scratch because we need to break things - the fact that we are going to change language as well is a different issue IMO.

And oh, ultimately, I would love to see multiple diverse implementations of the V2 protocol (and this is what should be cool with it, in allowing to do that more easily).

I wouldn't be shocked if someone would do a nodejs registry, or... a python one.

The one I want to focus on though is this one here in go, along with this community :-)

“But when we rewrite it, we'll do a better job” is less convincing to me than “but when we use $TOOL, $PROBLEM will no longer be an issue because of $FEATURE [$LINK]” ;).

I expect the following features:

... to give us a more resistant codebase, easier to maintain, easier to contribute to.

I also expect our "design (a bit) more" and "think (a bit) before" approach to prove more fruitful and also easier to maintain than our previous organisation.

I'm still not sold on the whole docker-registry-core pull-out.

Well, it was ugly, but it did benefit us a lot:

Anyhow, I think these are things best solved incrementally.

I hear you. I do think there are indeed strong benefits in solving things incrementally, and I think we did a lot on that front already, from 0.6 up to 0.9.

But then defining what is an "increment" and what is "disruptive" is a matter of "scale".

And I do believe the important parts to preserve here are:

The rest is not so much if you ask me.

And I still enjoy chitchatting with you, and I hope you will keep that voice up during that new journey :-)

dmp42 commented 9 years ago

@noxiouz #613 for specific extension model discussion (your ideas seem close to @dmcgowan 's)

wking commented 9 years ago

On Tue, Oct 07, 2014 at 03:21:09PM -0700, Olivier Gambier wrote:

And with clients doing verification, I doubt anyone will bother uploading broken signatures to the registry.

They will.

And that will result in a DOS, at best (content-addressability comes at a cost).

You can't handle this the same way that you already presumably handle folks uploading other objectionable content?

And oh, ultimately, I would love to see multiple diverse implementations of the V2 protocol (and this is what should be cool with it, in allowing to do that more easily).

I was coming from more of a “I don't have time to learn idiomatic Go or work on untangling Gentoo's Go packaging” perspective, which is probably not compatible with “I'm going to chip in to an unfunded Python v2 registry without help from Docker, Inc. folks” ;).

I also expect our "design (a bit) more" and "think (a bit) before" approach to prove more fruitful and also easier to maintain than our previous organisation.

But I can still help with the design and thinking without needing to figure out a sane Go development system locally ;).

bacongobbler commented 9 years ago

Here's my two cents on the proposal:

I can totally understand the desire to completely start from scratch. Having the ability to share packages from docker is a huge benefit to the registry. Speaking from Deis's perspective, this will require us to migrate from the old registry to the newer version which may take some time, so hopefully docker/docker will be able to maintain the v1 endpoints from an older registry for the forseeable future or we may be stuck on an older version until we update our infrastructure. That shouldn't affect the decision made here, but I wanted to give you a bit of insight on one of my concerns with this change. This is a big change, but I personally support the decision if you feel like it is a step in the right direction.

On that front...

The new image format drastically simplifies the concepts:

an image is a json file, with a mandatory, namespaced name, a list of tarsums (eg: content-addressable layers ids), some opaque metadata, a signature a layer is a binary blob, mapping to a tarsum Exit "ancestry" (now implicit from the order of layers inside the image "manifest"). Exit "layers are images are layers". Exit "layer json" etc.

So what happens to config/exposed ports/etc? Is that all going into the "some opaque metadata" format? Isn't this just an aggregation of all the concepts in the v1 API and just slapping on the v2 sticker?

Starting from scratch sure has its downsides, and I can't say I'm happy ditching the accumulated experience with V1/python (especially all the good work done on drivers), but in the end it's a reasonned choice, and I believe the benefits out-weight the downsides.

Most certainly. This change does not only affect the registry as well, but it also kills off all of the current python storage driver implementations, which may be affected substantially. For example, support for obscure drivers that only a small subset of users may be completely gone in a year or two because the maintainer has no time or experience in maintaining a second Go implementation of the same driver. If there's a way we can somehow make the drivers easy to develop and maintain, I'm happy with that.

In regards to store driver changes, I propose that these drivers should be maintained separately from core docker-registry, but instead of being separated into their own separate packages like last time, it should be maintained as a separate cloud-agnostic driver package as a key/value store on certain providers. That way, if someone wants support for an Openstack swift driver, a local filesystem driver, an in-memory key/value store like https://github.com/kelseyhightower/memkv or for an S3-based driver, for example, they only have to go to one repository that supports these drivers. This means that the registry has a hard dependency on this package, but it keeps maintenance of this package outside the context of docker-registry and other users can benefit from this package (e.g. someone else needs a cloud-agnostic filesystem driver for their backend). Thoughts on that?

All I can say is "specs, specs, specs". For my own needs, I'd like a way to contribute to the project in a way that would support this change. To facilitate that, we need a document so that external maintainers (i.e. contributors to docker/docker-registry who are not affiliated with Docker Inc.) like myself can contribute in some way, whether that be with the core API, the storage drivers, etc. This includes possibly some political discussion on the technology involved (do we handle API requests with something similar to Flask like Martini, or do we want to re-implement the world with the net/http package? Do we handle dependency management from third party libraries natively or do we use something like Godep?). I'm happy and comfortable with Go, so I'd definitely like to be in the loop if at all possible with the ongoing development of the v2 API. I assume that this issue is more of a "hey, we're doing this regardless but I wanted to give you a heads-up" more than an actual proposal. ;)

To note, the above point about getting into a discussion about the technologies/frameworks used to implement registry v2 is completely optional if we don't want to open that can of worms. It could potentially end up in a flamewar between what practice is better. Still, it'd be nice if there was some kind of heads-up or a day which we can discuss these changes in greater detail would be very much appreciated :)

All I can say is... Docker Registry hack day? :D

smarterclayton commented 9 years ago

Regarding:

PUT link layer into image

is this to mutate an image into a new image? I.e. given image A, PUT link B -> Image B with new signature? While useful for simple clients, it also makes the registry a bit more complex to implement - might there be an advantage in only having GET/PUT images, GET layer, GET tags?

wking commented 9 years ago

On Tue, Oct 07, 2014 at 09:21:44PM -0700, Matthew Fisher wrote:

All I can say is "specs, specs, specs". For my own needs, I'd like a way to contribute to the project in a way that would support this change. To facilitate that, we need a document so that external maintainers (i.e. contributors to docker/docker-registry who are not affiliated with Docker Inc.) like myself can contribute in some way, whether that be with the core API, the storage drivers, etc.

+1. In fact, I'd be tempted to put the specs in a separate repository from the implementation, to encourage folks to not get bogged down in a particular implementation. Of course, linking out to an external implementation would be fine.

dmp42 commented 9 years ago

@wking

You can't handle this the same way that you already presumably handle folks uploading other objectionable content?

Right now, for most DOS or security issues, we get away with ownership verifications.

Now, content-addressable ids (vs. random ids) makes the question of "ownership" more difficult.

Also, reducing the coupling to the auth. component is something I want.

Believe me, I hate the idea of computing tarsums on the server side - but for now I can't figure a way out...

About the dev env, I want to make this easy/easier to setup for contributors, so, efforts on that front are definitely worth it.

Now, about using gentoo, who am I to lecture you? :-))) http://www.motivationals.org/demotivational-posters/demotivational-poster-16518.jpg

dmp42 commented 9 years ago

@bacongobbler

So what happens to config/exposed ports/etc? Is that all going into the "some opaque metadata" format? Isn't this just an aggregation of all the concepts in the v1 API and just slapping on the v2 sticker?

Well, maybe it is :-)

Layers are still layers.

Images on the other hand are no longer "a specific layer". They are a chunk of json listing layers.

Content-adressability is a major change as well.

Indeed the per-layer config ends-up in the opaque part.

And yes, the engine itself will keep working as is - it's a transport-level format change - not (yet) an engine level change.

This change does not only affect the registry as well, but it also kills off all of the current python storage driver implementations, which may be affected substantially.

This is the one thing that really bugs me. Now, maybe we can get creative on this? maybe some "special compat driver" that would let you use old drivers through a combination of (http?) socket communication magical wrap? -> let's move that discussion over here #616

I assume that this issue is more of a "hey, we're doing this regardless but I wanted to give you a heads-up" more than an actual proposal. ;)

Makes me think I should clarify things here.

I won't lie to you: in the end, I'm the one with write-powers on the repo :-) - and I will have to make some calls, veto some things, take the blame and suffer the insults :-)

Now, what I want to try here is not some BS open-source parody where I would just dump source code and tell you guys "live with it".

I want to build an open-design process that works for all of us:

I don't know how much we will succeed in making that open-design process mesh with the need to deliver and ship a usable product with strong time constraints, but I really, genuinely want to try to pull this of and end-up with a stronger, better, more satisfied community (and less work for me :-)).

Any help here, I can definitely use.

I think the idea of commiting proposal and architecture notes as PR is a good one and will help managing the discussion.

dmp42 commented 9 years ago

@smarterclayton

is this to mutate an image into a new image? I.e. given image A, PUT link B -> Image B with new signature? While useful for simple clients, it also makes the registry a bit more complex to implement - might there be an advantage in only having GET/PUT images, GET layer, GET tags?

Ah, no.

Here it goes: since (layer) ids will now be content-addressable instead of random, there will no longer be clear ownership on a given layer (you AND me can legitimately generate it). Also, I want access control to be simpler and be "set" at push time rather than at pull time (right now, layers live flat in a non specific namespace, making auth lookup mandatory for every layer).

So, the idea would be to allow NOT pushing again something you already had access to and "linked" into another repository you have access to.

dmp42 commented 9 years ago

@wking @bacongobbler (and others) do you want we try a irc hack session / meetup / gathering thing?

Or even a hangout?

bacongobbler commented 9 years ago

please feel free to contact me on IRC/email and get things started. I'm online from 8-4PST :)

wking commented 9 years ago

On Thu, Oct 09, 2014 at 12:11:42AM -0700, Matthew Fisher wrote:

please feel free to contact me on IRC/email and get things started. I'm online from 8-4PST :)

I'll likely be around for those hours as well, but I personally prefer planning this sort of thing via something less synchronous, since I usually have better ideas after sleeping on something overnight than I do five minutes after prompting ;).

proppy commented 9 years ago

/cc @govidiupl from Google.

shykes commented 9 years ago

What about some kind of a binary protocol with multiplexing of read/writes streams? Draft of HTTP/2 looks good as a concept. We can take a look at some common binary serialization libraries (for example msgpack) and use one of them to communicate between core and plugins over tcp/unix domain socket. It allows us to implement a fast, flexible, easy-to-extend protocol. This protocol should be bidirectional to provide a full control over communication.

@noxiouz you are describing libchan :) It uses msgpack for serialization and implements multi-plexing over http2. https://github.com/docker/libchan. @dmp42 @dmcgowan for communication with extensions, I strongly recommend using libchan, since that is the direction we're going for Docker extensions also. If one of the goals is cohesion with the rest of the Docker platform, this one is a no-brainer.

dmp42 commented 9 years ago

Thanks @shykes

We have a preliminary backend driver implementation using libchan here: https://github.com/docker/docker-registry/pull/630

and some discussion going on extensions there: https://github.com/docker/docker-registry/issues/613

and on drivers there: https://github.com/docker/docker-registry/issues/616

Drivers and extensions have different targets though, and different speed/reliability/deployments strategy requirements, so we might end-up with different solutions here - libchan is definitely a strong lead.

visualphoenix commented 9 years ago

+1 on a docker-registry ng hackday. Would love to help.

ovidiupl-g commented 9 years ago

Greetings from Google Kirkland! Not sure this is the best place for a minor suggestion, but I'd be happy to write up a more detailed proposal separately. It might be worth adding an extra knob to the client-server protocol for handling overload and transient failures.

I believe the current client-side logic uses linear retries up to a max number of failures. First, I and several others would be really happy to see that logic evolve into exponential back-off with jitter for timeouts, disconnects and other transient failures. Ideally, that logic should be "no back-off on 302, use exponential back-off on 500, 502 and 503".

Second, we'd be really happy if the registry responded with a Retry-After header on 503 (and possibly on 3XX, if it chooses so), and if the client honored the value given in the header.

The goal is to avoid self-inflicted denial-of-service states, where Docker clients in a large-scale deployment synchronize their retries after a transient issue (e.g. temporary network failure) and slam registries at the same time. With friendly clients, randomized exponential back-off is the first line of defense, and server-controlled retry delays are the bigger hammer. (With unfriendly clients, there's always the other first line of defense :) ).

dmp42 commented 9 years ago

@visualphoenix nice to have you in ;) @govidiupl definitely welcome!

IRC meetings every monday 10AM PST. Otherwise, have a look around at tickets with the next generation label.

shreyaskarnik commented 9 years ago

@dmp42 and other contributors to the project, I was curious that in the docker-registry next generation implementation will there be an similar event stream just as the docker daemon which lists events like pull, push, delete, create tags and so on and so fourth. This kind of event stream will be useful to monitor events in the registry and opens up the possibility for having loosely connected consumers which monitor the events to create summary of the events occurring with the registry. If this thread is not the right place to open these kinds of requests/discussions I can open a new proposal as well, I wanted to do a temperature check first before opening a detailed proposal.

dmp42 commented 9 years ago

@shreyu86 that should certainly be part of the new extensions model: #613

shreyaskarnik commented 9 years ago

Thanks @dmp42

nathanleclaire commented 9 years ago

Just a random thought I've had lately: whatever form the v2 registry and mechanics around it take, it'd be really lovely to get rid of the round-tripping messages for "image layer already pushed, skipping". This seems to slow down pushes a ton and, though I'm sure the decision to do it that way was probably made for the right reasons at the time, baffles every single person that experiences it for the first time ("why can't it check them all at once?").

coolbrg commented 9 years ago

:+1: from my side for next generation docker registry :smile: Looking forward to it.

stevvooe commented 9 years ago

Closing this, citing the existence of docker/distribution.