Note: I'm preferring a hand-wavy "metadata fields" rather than getting into details of keys or column names depending on the Anndata section in question. And obviously, some Anndata sections do not allow duplicates. I could create a table per Anndata section if that would be preferred.
General Requirements
...
Reserved Names. The names of metadata fields MUST NOT start with "__". The names of the metadata fields specified by the schema are reserved for the purposes and specifications described in the schema.
Unique Names. The names of schema and data submitter metadata fields in obs and var MUST be unique. For example, duplicate "feature_biotype" keys in AnnData var are not allowed.
Note: I will also be remodeling the Annotator in all metadata fields from:
Reserved Names in General Requirements has been previously reviewed, but needs further clarification for new readers.
"Metadata keys" is imprecise when referring to schema fields that could be dict keys in .uns or
pandas.core.frame.DataFrame columns in .obs.
Note: when I revisited the rationale for using "Key" as the standard name in the schema field tables, I was reminded that keys is a common operation for both obs (DataFrame) and uns (dictionary).
adata.obs.keys()
adata.uns.keys()
In some cases, the underlying data type in AnnData already enforces uniqueness. But that is not always true. See A Few Times, I’ve Broken Pandas.
Uniqueness must be enforced for both author-provided and schema-defined fields. As @jahilton suggested Apply the uniqueness requirement to both schema and author metadata (we can’t think of any valid use of non-unique column names & Jim has noted the issues they introduce)
All reserved names are unavailable to submitters, but there are fields that are reserved for curators to annotate and fields that are reserved for CELLxGENE Discover to annotate.
The rationale for requirements such as Names starting with '__' must be reserved should be retroactively captured in the Change Log for the day when @brianraymor is no longer available to remember "Why?".
Design
Note: I'm preferring a hand-wavy "metadata fields" rather than getting into details of keys or column names depending on the Anndata section in question. And obviously, some Anndata sections do not allow duplicates. I could create a table per Anndata section if that would be preferred.
General Requirements
...
Reserved Names. The names of metadata fields MUST NOT start with
"__"
. The names of the metadata fields specified by the schema are reserved for the purposes and specifications described in the schema.Unique Names. The names of schema and data submitter metadata fields in
obs
andvar
MUST be unique. For example, duplicate"feature_biotype"
keys in AnnDatavar
are not allowed.Note: I will also be remodeling the Annotator in all metadata fields from:
numpy.ndarray
to something like:
numpy.ndarray
Context
See #single-cell-four.
Reserved Names in General Requirements has been previously reviewed, but needs further clarification for new readers.
dict
keys in .uns
orpandas.core.frame.DataFrame
columns in.obs
.Note: when I revisited the rationale for using "Key" as the standard name in the schema field tables, I was reminded that
keys
is a common operation for bothobs
(DataFrame) anduns
(dictionary).