appc / spec

App Container Specification and Tooling (archived, see https://github.com/rkt/rkt/issues/4024)
Apache License 2.0
1.26k stars 146 forks source link

spec: clarify purpose of labels #395

Open globalcitizen opened 9 years ago

globalcitizen commented 9 years ago

It's so damn generic, it's semantically empty. It's like writing namevaluepairs or variable.

Current child labels os and arch would seem groupable in to something less generic and that would see frequent enough use to warrant a more readable parent within the JSON structure (or just shift them each up a level).

Additional observed child label version has unclear meaning (re: difference with acVersion).

jonboulle commented 9 years ago

Well, yes, they are arbitrary name value pairs... I don't see how it's a semantic fail.

Evidently we just need a more verbose explanation of intended use. "Labels are arbitrary key-value pairs that are assigned at creation time and can be used during image discovery. For example, ...."

os and arch and version were originally top level but we wanted to be less opinionated since there are plenty of use cases where they do not make much sense (noarch, singleton, systems that generate unique name per build, environments that always use imageid, whatever). https://github.com/coreos/rkt/issues/186

The potential change I think we should perhaps make is to namespace them, e.g. under app.io, this is discussed somewhere else too.

is version really still unclear? @vbatts just tried to clear that up :-/

On Wed, May 6, 2015 at 9:58 AM, Walter Stanish notifications@github.com wrote:

It's so damn generic, it's semantically empty. It's like writing namevaluepairs or variable.

Current child labels os and arch would seem groupable in to something less generic and that would see frequent enough use to warrant a more readable parent within the JSON structure (or just shift them each up a level).

Additional observed child label version has unclear meaning (re: difference with acVersion).

— Reply to this email directly or view it on GitHub https://github.com/appc/spec/issues/395.

globalcitizen commented 9 years ago

The semantic fail is that labels is intrinsically meaningless. Label for what? What is a label? Creation time? Creation of what? Metadata? ACI? Pod? What is creation? Do you mean instantiation (Running the thing)?

It's about as vague as the other semantically winning structure, annotations, but slightly vaguer than the runner-up, dependencies.

About the version part .. yes, very unclear.

thockin commented 9 years ago

The meaninglessness is what makes them powerful. I don't - CAN'T - know what users want to do here. We have... let's just call it A LOT of experience with what people do with these things in real life and a big lesson we took away from Borg was that any structure you define will be insufficient for a large body of users.

Labels are a place to leave information so that you can retrieve it again later. Anything more semantic is less useful. There are some "well known" labels which you could argue should be first-class fields. I could buy that, maybe. But the utility of labels is not really up for debate, I think.

On Wed, May 6, 2015 at 3:26 PM, Walter Stanish notifications@github.com wrote:

The semantic fail is that labels is intrinsically meaningless. Label for what? What is a label? Creation time? Creation of what? Metadata? ACI? Pod? What is creation? Do you mean instantiation (Running the thing)?

It's about as vague as the other semantically winning structure, annotations, but slightly vaguer than the runner-up, dependencies.

About the version part .. yes, very unclear.

— Reply to this email directly or view it on GitHub https://github.com/appc/spec/issues/395#issuecomment-99629630.

globalcitizen commented 9 years ago

I recognize the purpose of having a catch-all place to store random name-value pairs.

However, right now it's not clear if this data is used anywhere, how it is made available, and what the difference is with annotations (equally vague).

vbatts commented 9 years ago

It does describe the manifest is available to the container via the metadata service. On May 6, 2015 18:04, "Walter Stanish" notifications@github.com wrote:

I recognize the purpose of having a catch-all place to store random name-value pairs.

However, right now it's not clear if this data is used anywhere, how it is made available, and what the difference is with annotations (equally vague).

— Reply to this email directly or view it on GitHub https://github.com/appc/spec/issues/395#issuecomment-99638484.

globalcitizen commented 9 years ago

Oh I see ... so this is data that is set in the manifest for all instances of the ACI that are started up (potentially mutiple at the same time?) and the resulting container can choose to use it for configuration or something?

Seems pretty obtuse given that you could just put it in a file.

Still not clear what the difference is with annotations?

vbatts commented 9 years ago

Obtuse perhaps. The spec is not prescribing it's usage, and usage patterns are yet to surface. On May 7, 2015 02:41, "Walter Stanish" notifications@github.com wrote:

Oh I see ... so this is data that is set in the manifest for all instances of the ACI that are started up (potentially mutiple at the same time?) and the resulting container can choose to use it for configuration or something?

Seems pretty obtuse given that you could just put it in a file.

— Reply to this email directly or view it on GitHub https://github.com/appc/spec/issues/395#issuecomment-99741533.

globalcitizen commented 9 years ago

Well if it's this unclear and it's duplicate functionality there's more than one school of software development (eg. Unix) that says you should "do one thing and do it well" and remove this stuff entirely. Personally I have architectural concerns about the metadata service, many problems with which I described over at #382 .. this issue seems to be some kind of outgrowth of that dubious additional complexity.

jonboulle commented 9 years ago

The Kubernetes definition of labels actually fits us almost perfectly, with minor modifications (in italics):

Labels are key/value pairs that are attached to ACIs. Labels are intended to be used to specify identifying attributes of ACIs that are meaningful and relevant to users, but which do not directly imply semantics to the core system. Labels can be used to organize and to select ACIs. Labels can be attached to ACIs at creation time, and are then immutable. Each ACI can have a set of key/value labels defined. Each Key must be unique for a given ACI.

I propose we a) add wording to the effect of the above to the spec; and b) adopt wording along the lines of Kubernetes' label selectors to describe labels in dependency references (discussed here)

globalcitizen commented 9 years ago

Aha! The origin and original purpose of labels becomes clear. Certainly, documentation would be a good partial fix.

However - at an architectural level, I'm still not sold on labels as described, particularly because the model inexplicitly supported by a post-creation-time-immutable labels notion is Google's traditional view of containers ("build 'em in house, deploy 'em in house") and that - it is easily argued - is not the emerging model.

According to the above description, the only purpose of a label outside of a container is to organize and select between multiple containers.

If this is really the case, then I would strongly suggest they should not be post-creation-time-immutable, because re-creating every packaged container in your collection just because you choose to manipulate your filing schema strikes me as a particularly high overhead modus operandi for ongoing container management.

The alleged use of a label internal to a container based upon metdata re-import is obtuse and duplicates the far more battle tested (re: accessibility, compression, hashable, latency, access control, etc.) and pre-existing/defacto option of "just store the requisite data as a file within the container package".

This is particularly the case when labels are post-creation-time-immutable.

jonboulle commented 9 years ago

the model inexplicitly supported by a post-creation-time-immutable labels notion is Google's traditional view of containers ("build 'em in house, deploy 'em in house") and that - it is easily argued - is not the emerging model.

I don't follow this reasoning at all. Certain labels (e.g. the well-known ones today, os/arch/version) are necessarily immutable precisely because images are "competing" in a global namespace and not completely controlled in-house or in a single monolithic repository. Consider the predominant Linux packaging systems today - labels are baked into the packages and e.g. any change to the source necessitates a new build and bump of these labels. This facilitates them being widely distributed and re-distributed by any variety of public/private mechanisms yet still have them be self-describing artefacts and be able to refer to consistently in a fairly sane way (obviously content addressing is necessary for complete determinism but 99% of the time the embedded name/labels will get you there).

Your concern seems to be that immutable labels are insufficiently flexible for bespoke filing schema/image inventory systems; I completely agree with you but that's not the purpose of labels and I don't think that kind of system is something we need to consider in the spec.

The alleged use of a label internal to a container based upon metdata re-import

Not sure what this means. I don't expect labels to be generally introspected by applications if that's what it's getting at; that's what annotations are for.