chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

The semantics for schema versions must be documented #404

Closed brianraymor closed 1 year ago

brianraymor commented 1 year ago

Design

Schema versioning

The CELLxGENE schema MUST use Semantic Versioning.

Major version is incremented when schema updates are incompatible with the AnnData and Seurat data encodings or CELLxGENE API(s). Examples include:

Minor version is incremented when schema updates may require changes only to the cellxgene-schema CLI or the curation process. Examples include:

Patch version is incremented for editorial updates to the schema.

Changes MUST be documented in the schema Changelog.

Context

The dataset schema does not document the semantics for versions because there have only been major updates in the past. With the introduction of ontology only updates, it must be possible for downstream services and consumers to interpret the meaning of the version number.

See @atolopko-czi's comments in single-cell-four:

it should have precise semantics and be well-documented. Downstream systems/consumers of datasets should be able to rationally filter by schema versions (or ranges) that are "compatible" per semver-like guarantees. E.g., and off the cuff: a major version change implies that the consuming system will likely require code changes to accommodate new/removed schema fields; a minor version change should be robustly handled by consuming systems, though new data attributes may be unhandled and thus (temporarily) missing in the downstream system; and so on.


This is how the cell census schema is modeling semver:

Cell Census Schema versioning

The Cell Census Schema follows Semver for its versioning:

Changes MUST be documented in the schema Changelog at the end of this document.

pablo-gar commented 1 year ago

Under this description:

Major version is incremented when schema updates are incompatible with the AnnData and Seurat data encodings or CELLxGENE API(s). Examples include:

I'm not sure I would consider this example falling into it:

Adding metadata fields

Adding new fields would not break anything downstream AFIK, it may not be readily visible or available to the Census for example, but it won't make it incompatible.

brianraymor commented 1 year ago

I moved Adding metadata fields to minor release.