freben commented 4 years ago

Background

A catalog entity is uniquely identified by the triplet kind, metadata.name and metadata.namespace, where the namespace is optional and implicitly "default" when not given.

The need to address a certain entity comes up in several places, without a clear standard on how it's done:

In Backstage urls: These are currently on the form /catalog/:kind/:optionalNamespaceAndName/ as a bespoke way of permitting the above semantics
In TechDocs: For constructing storage paths
In the Catalog backend API: These are currently on the form /entities/by-name/:kind/:namespace/:name where namespace is mandatory so that the caller has to be able to understand how to generate the default value
Inside entities themselves: There are more and more use cases for entities referencing other entities - for example a Component pointing to an API that it implements, or (soon) a Component to the System that it belongs to.

In particular for references, it is even not entirely clear what the semantics should be when the namespace is omitted. It could either mean "the same namespace as myself", or "the default namespace".

There seems to exist a need to be able to clearly define a simple single-string representation of an Entity by-name identifier. Note that this is different to the metadata.uid field of entities, which also is a valid means of addressing entities in the catalog backend API - but that's an opaque random string that is not human friendly, and changes if the entity is unregistered/re-registered, and should not appear in interfaces that people interact directly with.

The identifier will be typed a lot by hand in yaml files, and therefore humans must be able to construct it easily.

The identifier will appear in human visible URLs, and should be readable with a minimum of fuss.

Proposal

We suggest that this shall be the canonical form for a reference by-name to an entity:

The string is a colon separated triplet on the form <kind>:<namespace>:<name>.
The three strings are used verbatim without encoding.
The namespace may be empty, and is then considered as unspecified (rather than an empty string). The string is then <kind>::<name>; the colon may not be omitted.
The kind is case insensitive.

The end result is that a Backstage URL within the catalog may be for example /catalog/component::my-frontend/docs.

We also suggest that there may be places where nature of the reference makes the kind implicit, and where the namespace can be assumed to be the same as that of the source of the reference. One such example is a Component that wants to point to the System it belongs to, or a Group declaring its parent group. For those cases, the schema may define that only a single string with only the metadata.name of the target entity shall be used - and may not even support the full ref form.

Implementation

Parsing and construction of these identifiers will be added to the package @backstage/catalog-model.

Existing APIs and routes will be changed in a backwards-incompatible way to accommodate the new representation.

Drawbacks

If it is very common that the namespace is omitted, it will be a nuisance (and source of mistakes?) to always have the two colons. Another candidate would be to instead use <kind>:<name>[:[<namespace>]] where the last colon and/or part can be omitted entirely, but the other variant has already seen some use, and if there were ever a place where people might want to put a colon - it would be in the metadata.name.
Since the three parts are unencoded, they (or at least not the first two) must not contain colons. This is the case in the default setup of the catalog, but it's a big assumption. However having to enter parts on an encoded form would be prohibitively frustrating.
While being easy to type out by hand, the key can't easily be extended with more parts, filters, specifiers, variants, or versioning. This is a risk in the face of possibly wanting to extend or tweak it in the future.

Risks

There may be confusion on the distinction of by-name identifiers as per this proposal, vs. by-uid identifiers. We do not foresee that they will occur directly next to each other in writing or in interfaces, and people will generally only be aware of the by-name IDs unless they program directly against the catalog API.
The choice of making the kind case insensitive carries risk of introducing bugs in both frontend and backend code that makes string comparisons or selects data in persistent storage. We consider that the risk / inconvenience of typing out these IDs wrong may be more important.
There are ongoing discussions about whether there will be future need for further grouping/discrimination of entities, for example in a multi-tenancy situation to allow different organisations to collect entities under their own org. It is currently assumed that this will not have an effect the keys described in this RFC. Those groupings tend to cut along auth/access control boundaries, and would therefore be limited by other mechanisms. As a purely hypothetical future example, orgA may have a kind+name+namespace collision with orgB but would still only see their own entity based on an auth token passed to the catalog request.

Rugvip commented 4 years ago

Haven't thought this through a lot, but another possible delimiter to use is ., since it can be used as is in URLs. Not sure how smart it is tho xP

dhenneke commented 4 years ago

Inside entities themselves: There are more and more use cases for entities referencing other entities - for example a Component pointing to an API that it implements, or (soon) a Component to the System that it belongs to.

In regard of a field such as implementApis, do you think that the contents would always be referenced in the full <kind>:<namespace>:<name> model although this field only allows kind=API? We are also thinking about the system model and the not yet existing Domains and Systems, and there will often be the case that target relationships are limited to certain kinds.

Rugvip commented 4 years ago

@dhenneke There's an upcoming RFC where we can discuss that, just haven't gotten around to writing it yet :grin:

Hopefully shows up today or tomorrow.

freben commented 4 years ago

That's one place where I'm pretty sure it can be just the name. But it's open for discussion! If we ended up having to reference APIs in a different namespace, it wouldn't suffice.

freben commented 4 years ago

@Rugvip yeah colon is "reserved" although not declared unsafe. One safe char is tilde ~ - a bit odd tho

Rugvip commented 4 years ago

Just leaving these here to see what they feel like :grin:

.

https://backstage.spotify.net/entity/Component.backstage.frontend/builds

metadata:
  owner: User..patriko

spec:
  dependencies:
    - ..database
    - .playlists.playlist-service

:

https://backstage.spotify.net/entity/Component%3Abackstage%3Afrontend/builds

metadata:
  owner: User::patriko

spec:
  dependencies:
    - ::database
    - :playlists:playlist-service

An idea for dealing with double delimiters would be to assume that they are supposed to be filled in first in the middle, and then prefixed. So a:b would expand to a::b, while :b and b would expand to ::b.

With that in mind:

.

metadata:
  owner: User.patriko

spec:
  dependencies:
    - database
    - .playlists.playlist-service

:

metadata:
  owner: User:patriko

spec:
  dependencies:
    - database
    - :playlists:playlist-service

It's less to type but feeling a bit too magic

freben commented 4 years ago

Just also dropping in here that k8s forces you to specify pluralized kinds too, so that they can form urls such as /users/freben (which doesn't address the namespace in and of itself). That's something we could conceivably do too, at a bit of a greater effort.

freben commented 4 years ago

Looking at internal usage, we have for example tons of pipelines and similar that have dots in their names. Probably no colons though.

freben commented 4 years ago

Looking at them as paths:

freben > implicitly same as ./freben and /users/default/freben (which incidentally would be the frontend URL as well)

Even though they are pathlike, they are treated as plain strings if entered like that in yaml

owner: /users/default/freben

Hmm.

Rugvip commented 4 years ago

@freben yeah .'s look a bit shite anyway, happy to not move forward with that idea 😁

Yeah paths could be a way to go, bit tricky to make namespace optional then though?

Using the previous examples:

https://backstage.spotify.net/entity/components/backstage/frontend/builds

metadata:
  owner: /users/patriko

spec:
  dependencies:
    - database
    - playlists/playlist-service

Rugvip commented 4 years ago

Gonna suggest that we settle on the following:

# absolute ref
<kind>:<namespace>:<name>

# namespace relative
<kind>:<name>

# kind and namespace relative
<name>

Parsing would look like this (not tested :grin:):

let kind, namespace, name;

const split = ref.split(':');
if (split.some((part) => part === '')) {
  throw new Error('no empty!');
}
if (split.length === 1) {
  [name] = split;
} else if (split.length === 2) {
  [kind, name] = split;
} else if (split.length === 3) {
  [kind, namespace, name] = split;
} else {
  throw new TypeError('meh');
}

return { kind, namespace, name };

Or described with words:

Entity references contain 1-3 components separated by :, where none of the components are allowed to be empty. The last component is always the name. If there is more than one component, the first one is the kind. If there are three components, the middle one is the namespace.

The reason for skipping the <kind>::<name> format is that we're not 100% certain we'll actually want to keep namespaces long-term. It also removes a bit of clutter from the most commonly used relative references, which I'm assuming to be <name> and <kind>:<name>.

The downside is that there is no kind relative reference, i.e. referencing the same kind in a different namespace. It is possible to open up for format if we feel a need for that though, by allowing :<namespace>:<name>. The downside is that it may not be intuitive what the difference between <x>:<y> and :<x>:<y> is, which is why I think we should just stick to absolute refs instead of kind-relative ones.

Another downside of this format is that it requires slightly more parsing logic than [kind, namespace, name] = split(':'), but in reality I think we will at least also want to be checking whether name is non-empty, at which point it's worth extracting the parsing into a separate function exported from @backstage/catalog-model anyway.

Rugvip commented 4 years ago

aaaaaand revised version after some discussion with @freben:

# absolute ref
<namespace>/<kind>/<name>

# namespace relative
<kind>/<name>

# kind and namespace relative
<name>

Parsing would look like this (not tested :grin:):

const split = ref.split('/');
if (split.some((part) => part === '') || split.length === 0 || split.length > 3) {
  throw new Error('bad!');
}

const [name, kind, namespace] = split.reverse()
return { kind, namespace, name };

Difference is that we use / as a delimited to make it more URL-like, which is actually what kubectl does too, e.g. pod/proxy-7c5958c8fb-2ttdt.

We also flip the namespace around to be first, which simplifies the parsing logic and makes the relative forms a bit more intuitive.

We'd also switch to using the absolute format in the catalog URLs, i.e. /catalog/default/component/backstage. Remains to be decided what we do with the namespace there though, as it's wouldn't be as straightforward to migrate away from namespaces.

hooloovooo commented 4 years ago

Is it really desirable to make it URL like? IMHO it makes it harder to see what part in a URL refers to the component and what's just part of the URL structure. It would also make it harder to abbreviate it if using default namespace.

Rugvip commented 4 years ago

@hooloovooo I was thinking we'd be comparing namespace%3Akind%3Aname and namespace%2Fkind%2Fname in that case, both of which are a bit shite. Found https://stackoverflow.com/questions/2053132/is-a-colon-safe-for-friendly-url-use#answer-14269897 though, which suggests that : is probably completely fine to use in URLs, even though they're intended to be encoded.

With that in mind I'd be very open for using <namespace>:<kind>:<name>, even as in URLs, with the namespace being optional.

Ending up with something like https://backstage.spotify.net/catalog/component:backstage-frontend/builds, or https://backstage.spotify.net/catalog/backstage:component:frontend/builds with a namespace.

If we feel moving the namespace to the front is too weird I'd happily go with https://github.com/spotify/backstage/issues/1947#issuecomment-684891760 as well.

If we're happy to use unencoded colons in the URL I think that has a pretty big benefit over splitting the entity ref into 3 separate path components in the URL.

Rugvip commented 4 years ago

Ah, https://stackoverflow.com/questions/1737575/are-colons-allowed-in-urls#answer-43283492 describes it well, so a relative link from /catalog to component:backstage-frontend would be broken, but it's fixed easily by using ./component:backstage-frontend. That might actually be worth the hassle?

hooloovooo commented 4 years ago

If that's the only downside it seems like a nice solution. It would probably be quite easy to get a lint rule in place to check it as well.

freben commented 4 years ago

Possible caveat that one doesn't always control the url encoding. If you pass the raw identifier to any kind of URL builder, you can be sure that it will uriencode it and that will encode the colons whether you want it or not :)

freben commented 4 years ago

I just don't want to end up in a situation where it's unclear if we risk ending up with double encoding or double decoding.

Rugvip commented 4 years ago

With the current character set I don't think we're really gonna get in that situation right? Since we're not allowing % we can essentially decode until it's a noop. Or in a more sane way, always decode, and if it wasn't encoded that's no biggie.

freben commented 4 years ago

decode until it's a noop

Believe me, that way lies madness :)

@hooloovooo for context, I am more and more leaning towards the idea that forming URLs, and forming identifiers to be typed by humans, are two fundamentally different problem spaces and should not be conflated at all.

I believe that we have two choices.

Either we pick a nice hand written format and use that verbatim in URLs too, and accept that this WILL mean they will be ugly-encoded, no two ways about it.

Or, we shape URLs to be nice for routing and human understanding (encoding the three parts as usual where necessary), and separately handle how to express identifiers in yaml.

As far as I understand it, k8s themselves chose the latter. They even pick pluralized versions of your kind, so your route is /jobs/.... And I think they made that decision for good reason.

So I voted that our frontend paths be made /catalog/default/component/my-id and no magic about it. And that the hand written refs could, by symmetry, be the same, if we decide that slashes will never, ever, for any reason, be allowed in neither namespaces, kinds nor names :)

freben commented 4 years ago

For example:

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#list-statefulset-v1-apps

Here's an example of how resource definitions work

https://docs.okd.io/3.10/admin_guide/custom_resource_definitions.html

Rugvip commented 4 years ago

decode until it's a noop

Believe me, that way lies madness :)

I know, that's why I also proposed a sane way :p

freben commented 4 years ago

Attempting to nail down the two remaining options, as I understand them.

In both options, we retain the notion that strings are case insensitive from a consumer's point of view; for example, you may specify "component" instead of "Component".

Option 1 - Separate URL logic, use slashes

Frontend and backend service URLs that want to reference an entity by name will ALWAYS contain a triplet of namespace, kind, and name, in that order. If the source entity has no namespace, the equivalent string "default" is used. There are no special encoding rules, apart from the normal ones for URLs - each path segment is encoded according to web standards. Example:

# /catalog/<namespace>/<kind>/<name>/builds
/catalog/default/component/my-service/builds

YAML files that want to reference an entity by name will use a slash separated sequence of optional namespace, optional kind, and required name. If the namespace is left out, it is implied to be the same namespace as that of the referencing entity. If the kind is left out, it is implied to be contextual (e.g. if it is known from the context of the value that it points to an API kind entity, then it has the value "api"). Partial example:

metadata:
  name: my-service
spec:
  # All explicit
  owner: infra/group/my-team
  implementsApis:
    # Same as default/api/external-customer-api since this entity had no namespace given
    - external-customer-api

Values in YAML files are not URL encoded.

This assumes that namespaces, kinds and names never will contain slashes.

Option 2 - Unified logic, use colons

Frontend and backend service URLs that want to reference an entity by name will use a colon separated triplet of optional namespace, required kind, and required name. If the namespace is left out, it is implied to be default. The path segment as a whole MAY be encoded, so a reader will have to try to decode it once. Example:

# /catalog/[<namespace>:]<kind>:<name>/builds
/catalog/component:my-service/builds
# - or -
/catalog/component%3Amy-service/builds

YAML files that want to reference an entity by name will use a similar scheme, with the addition that the kind can be left out as well (but only when the namespace is left out), implying that it is contextual. Example:

metadata:
  name: my-service
spec:
  # All explicit
  owner: infra:group:my-team
  implementsApis:
    # Same as default:api:external-customer-api since this entity had no namespace given
    - external-customer-api

Values and colons in YAML files are not URL encoded.

This assumes that namespaces, kinds and names never will contain colons.

Rugvip commented 4 years ago

Just because this popped into my mind I'm throwing in

Option 3 - Explicit delimiters

Use different separators for kind and namespace, making it possible to emit either in a clear way, as required by different contexts.

For frontend and backend service URLs, the encoding is similar to option 1 and 2. Example:

# /catalog/<kind>/[<namespace>:]<name>/builds
/catalog/component/my-service/builds
# - or -
/catalog/component/default:my-service/builds

YAML files that want to reference an entity by name will use the same schema, with the addition that the kind can be left out as well. Example:

metadata:
  name: my-service
spec:
  # All explicit
  owner: group/infra:my-team
  implementsApis:
    # Same as api/default:external-customer-api since this entity had no namespace given
    - external-customer-api
    # - or -
    - default:external-customer-api
    # - or -
    - api/external-customer-api
    # - or -
    - api/default:external-customer-api

Values and colons in YAML files are not URL encoded.

This assumes that namespaces, kinds and names never will contain colons or slashes.

freben commented 4 years ago

For option 3 my brain really wanted it to be api:default/external-customer-api because the delimiters match how I model them in my brain :) even though it may seem a bit more odd in a URL - but again, I argue that it's worthwhile to separate out how URLs are formed from how hand written refs are written, since they have conflicting goals and I see omissions in the URL as purely complicating things.

freben commented 4 years ago

For the YAML ref field type, I could even picture it being effectively

string | { namespace?: string; kind?: string; name: string }

plus future support for other selector fields.

Rugvip commented 4 years ago

Just to have this here then:

metadata:
  name: my-service
spec:
  owner:
    kind: group
    namespace: infra
    name: my-team
  implementsApis:
    - external-customer-api
    # - or -
    - namespace: default
      name: external-customer-api
    # - or -
    - kind: api
      name: external-customer-api
    # - or -
    - kind: api
      namespace: default
      name: external-customer-api

Meh, it is pretty verbose for something that might have to actually be typed out and read quite a bit. Ofc way more explicit though. Possibly a starting point where we can wrap things up in an API that is able to read that type, just so we kinda get that plumbing in place? Then in the future we might end up adding some short form if we feel it's needed.

Regarding URLs it looks like we might end up in a situation where they might be a bit easier to change down the line, so not too opinionated about that. The effort of this decision vs the possible value differences between the different outcomes is pretty high at this point xP - i.e. fine with just going for x/y/z

freben commented 4 years ago

But I meant that the string form would be the compound option 1 or 3 :)

So I would actually start with the short form, but yes, implementing that function in catalog-model which eats a Json value and spits out a namespace, kind, name triplet object is step uno.

freben commented 4 years ago

https://github.com/spotify/backstage/pull/2532

summary:

[<kind>:][<namespace>/]<name>

with a bonus extended form

kind: <kind>
namespace: <namespace>
name: <name>

and for urls,

:namespace/:kind/:name

For full transparency, this last one i am actually not too sure of - having kind before namespaces occurs both in the frontend and backend already, so choosing to keep it that way would be less breaking. Let me know what you think! The benefit of having namespace first could be that it conceptually is a top level hierarchy in one's mind and also would be fitting to have as a terminal URL - /catalog/default would show all things in that "subcatalog" so to speak - everything under that namespace.

freben commented 4 years ago

Closed by #2532

backstage / backstage

[RFC] Uniform Entity Refs #1947

Background

Proposal

Implementation

Drawbacks

Risks

.

:

.

:

Option 1 - Separate URL logic, use slashes

Option 2 - Unified logic, use colons

Option 3 - Explicit delimiters