Design

Note: I'm preferring a hand-wavy "metadata fields" rather than getting into details of keys or column names depending on the Anndata section in question. And obviously, some Anndata sections do not allow duplicates. I could create a table per Anndata section if that would be preferred.

General Requirements

...

Reserved Names. The names of metadata fields MUST NOT start with "__". The names of the metadata fields specified by the schema are reserved for the purposes and specifications described in the schema.

Unique Names. The names of schema and data submitter metadata fields in obs and var MUST be unique. For example, duplicate "feature_biotype" keys in AnnData var are not allowed.

Note: I will also be remodeling the Annotator in all metadata fields from:

Key	myKey
Annotator	Curator
Value	`numpy.ndarray`

to something like:

Key	myKey
Annotator	Curator MUST annotate.
Value	`numpy.ndarray`

Context

See #single-cell-four.

Reserved Names in General Requirements has been previously reviewed, but needs further clarification for new readers.

"Metadata keys" is imprecise when referring to schema fields that could be dict keys in .uns or pandas.core.frame.DataFrame columns in .obs.

Note: when I revisited the rationale for using "Key" as the standard name in the schema field tables, I was reminded that keys is a common operation for both obs (DataFrame) and uns (dictionary).


adata.obs.keys()
adata.uns.keys()

In some cases, the underlying data type in AnnData already enforces uniqueness. But that is not always true. See A Few Times, I’ve Broken Pandas.
Uniqueness must be enforced for both author-provided and schema-defined fields. As @jahilton suggested Apply the uniqueness requirement to both schema and author metadata (we can’t think of any valid use of non-unique column names & Jim has noted the issues they introduce)
All reserved names are unavailable to submitters, but there are fields that are reserved for curators to annotate and fields that are reserved for CELLxGENE Discover to annotate.
The rationale for requirements such as Names starting with '__' must be reserved should be retroactively captured in the Change Log for the day when @brianraymor is no longer available to remember "Why?".

chanzuckerberg / single-cell-curation

Reserved Names and Uniqueness requirements must be clarified #641

Design

General Requirements

Context