Standardize how we work with (cluster) app schema

marians commented 1 year ago

In Cluster API we create workload clusters via apps (as in the Giant Swarm app platform). We have https://github.com/giantswarm/roadmap/issues/1181 to provide a web UI for cluster creation, and with this we will rely on the cluster app schema to generate a UI.

This brings new requirements for our work with app schema. This issue exists to track our efforts towards defining these requirements, developing our tooling and enabling teams to work towards these requirements.

Tasks outline

### General JSON Schema knowledge sharing
- [x] #1846
- [x] Establish Slack channel
- [ ] Create handbook page

### Re-usable schema
- [x] Establish a repository and server for reusable schema
- [x] Establish workflows, share knowledge

### Define requirements for cluster app values schema
- [ ] RFC: https://github.com/giantswarm/rfc/pull/55

### Development and CI tooling
- [x] Provide tooling to normalize JSON schema as a developer
- [x] Check normalization of cluster app values schema in CI
- [x] Let `schemalint verify` rule set for cluster apps to match our [RFC](https://github.com/giantswarm/rfc/pull/55)
- [x] Validate cluster app default values against schema in CI

### Workflows
- [ ] How to deal with incompatible schema changes in cluster apps (RFC)
- [x] Provide tooling to migrate values/config to newer schema versions
- [ ] Introduce `additionalProperties: false` to all objects
- [ ] Establish rules for defaulting
- [ ] Define workflow regarding valid values for the kubernetes version

### Alignment
- [x] #2093
- [ ] https://github.com/giantswarm/roadmap/issues/2515

#### Azure
- [x] #1990
- [ ] https://github.com/giantswarm/roadmap/issues/2582

#### AWS
- [x] #2095
- [x] #2127
- [x] #2139
- [x] #2168
- [x] #2142
- [x] Update cluster-aws repository config in giantswarm/github for flavour clusterapp
- [ ] https://github.com/giantswarm/roadmap/issues/2581

#### Cloud-Director
- [x] #2096
- [ ] #2132
- [ ] https://github.com/giantswarm/roadmap/issues/2328
- [x] Normalize schema using schemalint v2
- [ ] https://github.com/giantswarm/roadmap/issues/2510
- [ ] Generate Markdown documentation for cloud-director schema

#### VSphere
- [x] #2098
- [ ] Make VSphere schema compliant with cluster-app rule set
- [ ] Move VSphere default values into schema
- [ ] Update cluster-vsphere repository config in giantswarm/github for flavour clusterapp
- [ ] Move VSphere values properties into common schema structure

Details and Goals

General JSON Schema knowledge sharing

It seems as if it's impossible to find a good talk on Youtube. So we'll do it internally.

Reusable schema

Goal: create synergies, avoid duplication in schema development between provider implementations.

In addition to establishing a repository, we will have to decide which reusable schemas to create and how to work towards using them. This may be an iterative, ongoing effort.

Requirements for cluster app schema

Goal: have clearly documented requirements for cluster app schema, which are understood by the KaaS teams, and which can be validated against.

Development and CI tooling

Goal: increase confidence and speed, reduce friction in the schema development process.

Schema generation

Goal: Simplify the creation of schema, compared to the helm schema-gen process, which creates only a scaffold.

Normalization

Goal: produce minimal diffs in schema changes by avoiding purely cosmetic changes (think go fmt).

Validation

Goal: Be certain that values.yaml validates against values.schema.yaml.

Workflows

Goal: Establish ways how to deal with common challenges across teams.

Schema alignment

Goal: Users working with one provider should be able to transfer their knowledge to other providers easily. Also create synergies and avoid efforts.

mcharriere commented 1 year ago

Hi, I've been considering adapting https://github.com/norwoodj/helm-docs to produce schemas.

Basically, it parses the yaml and generates a table with docs. Descriptions are pulled from comments and types are inferred or also pulled from comments.

Regarding validation, I think we have that in place already, abs runs ct lint on the chart, which checks values against the schema.

marians commented 1 year ago

@mcharriere Regarding generating schema. I think the approach of generating schema based on values.yaml is great for the beginning, but will get more and more complicated as the schema evolves. Having syntax in comments sounds like replicating JSON schema capabilities into YAML comments. But for what benefit? It may be simpler to

Generate barebones schema from values using helm schema-gen
Move all comments and annotations into values.schema.json as property annotations (title, description).
Use values.yaml purely for defaults (not for documentation of all config options)
Maintain and improve values.schema.yaml by adding properties, adding annotations (e. g. examples).

We should see what tooling we can get (or develop) to support this workflow.

marians commented 1 year ago

@mcharriere Regarding validation: yes, ABS validates values against schema.

When working more actively on our schemas, we need something more sophisticated. We need

Checks that the schema is actually valid (against the JSON schema spec)
Check that our own requirements for cluster app schema are met (e. g. must provide title for each property, must provide items shema for array properties.)

mcharriere commented 1 year ago

But for what benefit?

I'd say that not having to maintain 2 files (3 if you add the README) is one. And the lazy me would add not dealing with JSON.

For me, having just 1 source of truth is desirable. With this I mean that we could do it the other way around, generate values.yaml and the readme from the schema.

marians commented 1 year ago

Of course, the goal should be to not have to maintain the same info in several places. Documentation (README) can be generated from JSON schema, too. And values.yaml does not need descriptive comments if the README explains everything nicely.

marians commented 1 year ago

Proposal towards re-usable schema:

giantswarm/schema as the repository to maintain schema in. See the README for details.
giantswarm/schema-server as the component which serves schema under nice URLs like https://schema.giantswarm.io/image/v0.0.1
https://schema.giantswarm.io/image/v0.0.1 as an example of a re-usable schema. Here is an example showing how such re-usable schema can be used in an app’s values.schema.json: https://github.com/giantswarm/dex-app/pull/236

marians commented 1 year ago

Regarding schema linting, we have https://github.com/giantswarm/schemalint now to verify whether a JSON schema is valid.

This is already in use in https://github.com/giantswarm/schema (PR actions). Schema developers can also quickly install it via go install github.com/giantswarm/schemalint and use it via schemalint PATH.

The tool does not yet test against our own schema quality standards. It only ensures validity.

marians commented 1 year ago

On normalization: schemalint has the normalize command now to normalize the whitespace and property sorting of a JSON schema file.

With https://github.com/giantswarm/schemalint/pull/7 we are adding the check for normalization to the schemalint verify command.

marians commented 1 year ago

Here is a slide deck which I've started creating today to help explain: https://docs.google.com/presentation/d/16McEeTiDPyVPglnvqguC3K98LxDB-R-S8mk2uhmAzkg/edit#slide=id.g1197062f044_3_0

marians commented 1 year ago

Some updates

Talked to Clippy about the cluster-azure schema design. The plan is to carefully design the schema starting January, as a collaboration between Rainbow and Clippy.
Added more content to the slides. Will summarize here, too.

Added slide on schema main sections

This is a proposal for the high-level structure.

Edit: outdated proposal

More tooling notes

Generating values documentation based on schema
Migration of values between schema versions
Detecting breaking changes between two schema versions

erkanerol commented 1 year ago

I liked the idea behind .experimental.

marians commented 1 year ago

I found a PR (https://github.com/giantswarm/cluster-aws/pull/192) by @AndiDog which replaces a values property and deprecates the old one. This is the first time I see this (in our context).

marians commented 1 year ago

For clarification of the use of multiple types in a property I have created https://github.com/giantswarm/roadmap/issues/1892 for Rocket.

marians commented 1 year ago

In talking about the proposed /experimental category, we came to the conclusion that the category should also fit config that is internal (Giant Swarm only, not for customer use, no UI support), but not experimental in nature. So I'm renaming this to /internal.

marians commented 1 year ago

Planning note for this sprint

Adapt schemalint rule set for cluster apps to reflect our RFC (most important parts)
Start restructuring cluster-azure schema into desired form

marians commented 1 year ago

In the common schema proposal, /provider got renamed to /providerSpecific, as we found a naming conflict. Details

marians commented 1 year ago

We just learned that there are side effects when using subcharts (aka library charts, dependencies).

If a cluster app schema has "additionalProperties": false on the root level, and it is supposed to use a library chart like cluster-shared, then there has to be a property /cluster-shared in the schema of the cluster app.

This upstream issues explains some context: https://github.com/helm/helm/issues/10392

For our UI we most likely have to ignore/exclude /cluster-shared again.

marians commented 1 year ago

We just learned that ~app-operator~ cluster-apps-operator injects additional values into the schema. See https://github.com/giantswarm/cluster-azure/pull/78 for details. The properties needed are

/managementCluster
/baseDomain
/provider

All three are type string.

For now, this means:

Validation should ensure that these properties exist in the schema. Otherwise, with additionalProperties: false on the root level, cluster creation fails.
The RFC should include these requirements
Our cluster creation UI must exclude these properties, as users are not supposed to set them.

In the mid term I would prefer to move these properties into one of the other main sections, perhaps /internal. I'm sure there will be many effects to take care of. We'll have to talk to Honeybadger about this.

marians commented 1 year ago

@mogottsch With https://github.com/giantswarm/cluster-aws/pull/239 I am adding temporary CI checks to cluster-aws, to be replaced by devctl once it's ready.

marians commented 1 year ago

While I work on cluster app schemas in detail, I'm taking some more notes for future alignment steps here. Subject to change.

Image repositories
- Should be an object with string properties
- VSphere and VCD have two string props (repository, tag)
- AWS has three (registry, repository, tag)
- We have https://github.com/giantswarm/schema/tree/main/image for all these
Cluster name (https://github.com/giantswarm/schema/tree/main/clustername)
Volume sizes
Subnets and CIDR blocks
Node taints (https://github.com/giantswarm/schema/pull/14)
Labels. Sometimes array, where objects would make more sense. Sometimes two string properties, sometimes one with format key=value.
Control plane node configuration should get sub section in .controlPlane.node, including:
- volume sizes
- health check
- instance type
Comma-separated values should be constructed from array (instead of a string)

marians commented 1 year ago

Road block appeared. Obviously the KaaS teams have decided that going forward they want to use objects for node pools, with user-defined keys, using the JSON Schema feature patternProperties.

Our UI library (react-json-schema-form) does not support this currently.

Internal discussion: https://gigantic.slack.com/archives/C03LV8E0RL7/p1679309519421109

weatherhog commented 1 month ago

@marians do you want to come up with an idea how to proceed here?

marians commented 1 month ago

I think the main goals of this issue have been accomplished. We do have an agreement how to deal with the schema of cluster apps (it's mostly the app named cluster by now).

However, there are still some ways to improve our processes in here which I think haven't been addressed yet. Probably we should bring this to KaaS sync or SIG architecture and discuss whether we should pursue those, and who.

weatherhog commented 1 month ago

@marians will you take this to KaaS Sync or SIG Architecture?

giantswarm / roadmap