Azure / bicep-types

Shared libraries for reading and writing Bicep type definitions
MIT License
7 stars 7 forks source link

Issues with types.json and Bicep extensibility #153

Open rynowak opened 2 years ago

rynowak commented 2 years ago

Issues with types.json and Bicep extensibility

This is a catalog of design issues/concerns found using the types.json format and tooling to teach Bicep about Kubernetes.

Individual items mentioned here represent design issues or challenges - basically all of the ways that the Bicep compiler e2e is coupled to ARM's types. Not all of these have to be solved, I'm erring on the side of including design issues even if there is a solution. In general the coupling between Bicep and ARM is spread across many layers currently - from the types.json code to the compiler's semantic diagnostics and code generation.

Structure and generation of types.json

types.json is the FFI format for ingesting types into the Bicep compiler. It's designed as a non-human-readable/writable representation of the Bicep type system. In constrast the APIs Bicep code wants to interface with are generally described using JSON-Schema or the the Swagger/OpenAPI variant of JSON-Schema.

There's therefore an impedence mismatch - JSON-schema can express a wide variety of complex schemas that the Bicep compiler cannot understand. The process of generating a types.json for an API means resolving this impedence mismatch and mapping the set of JSON-schema types for an API into the set of Bicep types. This process will likely be different for every API we want to map into Bicep.

The converter/generator of types.json in this repo is built for ARM's REST API conventions, and has deep understanding of the patterns that appear in those OpenAPI documents. When I tried to use the autorest-based converter, I found that I had to re-implement almost all of it just to get to hello world. Dialects of OpenAPI used by different systems are drastically different in practice.

Such a conversion tool is responsible many tasks:

In practice for a non-ARM system all of the steps named as domain specific will need to be written from scratch or highly customized. Systems extend OpenAPI in proprietary ways, or use features sets that don't overlap with ARM.

Some examples:

I'd strongly suggest turning types.json into a fully-documented and human-readable/writable format, and let the extensibility author write their own generatior/mapper. Another clickstop would be to allow Bicep to read JSON-schema with annotations that are specific to Bicep's type system. Then the scope of a conversion tool is limited to trimming down the original document into just the relevant types and adding annotations to describe the Bicep behavior.

Coupling to ARM's conventions

types.json does not fully describe the behavior of a type once it gets inside the compiler. There are important semantic details and behaviors that are hardcoded in the compiler rather than data-driven by the format.

For reference there's a set of well-known properties defined:

These property are given special status by the Bicep compiler, and compiler features rely on these properties appearing at the top level of an object for correctness. For a motivating example the name property is required to be loop-variant, and required to be unique for a type in the same template. Many places in code will discard a resource that does not define name. However in Kubernetes the actual property with those semantics is .metadata.name.

For each of the special behaviors that can be granted to a property it should be possible to indicate in types.json with property flags to which properties they apply.

A short list of things that bake in ARM's conventions.

To detail this further - compare the property flags supported by types.json and the property flags supported by Bicep

Features that are not extensible

Missing features in object structure

Kubernetes objects largely don't use discriminated unions in the same way ARM resources do - they use a different pattern (oneOf) to express a choice. You can read more about their conventions here.

Providing a good tooling experience for other systems means supporting some consensus set of features they rely-on for validation/structure.

An example of this pattern using different sources to set environment variables. Only one of either value or valueFrom can be used.

apiVersion: v1
kind: Pod
metadata:
  name: envar-demo
spec:
  containers:
  - name: envar-demo-container
    image: gcr.io/google-samples/node-hello:1.0
    env:
    - name: DEMO_GREETING
      value: "Hello from the environment" # using .value to set an environment variable
    - name: DEMO_FAREWELL
      valueFrom:                          # using .valueFrom to set an environment variable
        secretKeyRef:
          name: mysecret
          key: goodbye

In ARM this would be expressed similar to:

resource foo '...' = {
    ...
    properties: {
        ....
        env: [
            {
                name: DEMO_GREETING
                value: {
                    kind: 'static'
                    value: 'Hello from the environment'
                }
            }
            {
                name: DEMO_FAREWELL
                value: {
                    kind: 'secret'
                    name: mysecret
                    key: goodbye
                }
            }
        ]
    }
}
rynowak commented 2 years ago

/cc @anthony-c-martin @majastrz

jeskew commented 1 year ago

I'd strongly suggest turning types.json into a fully-documented and human-readable/writable format

I would be interested in picking this idea back up, especially since there is some additional metadata we would like to include in provider-supplied types that cannot be captured by flags (specifically, type refinements and ARM resource ID metadata).

rynowak commented 1 year ago

happy to chat if you want to brainstorm here.