chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
38 stars 24 forks source link

Add description to schema #700

Closed rcannood closed 6 months ago

rcannood commented 11 months ago

Motivation

I'm working on aligning the data formats used in OpenProblems to that of the CELLxGENE census corpus, because it will allow us to pull a lot of datasets directly from cellxgene census. I'm working on adding some documentation for my code and the schema of the datasets we use in this project, but it seems like the CELLxGENE schema specified in this repo doesn't contain a very descriptive description for each of the keys in the schema.

For instance, for cell_type_ontology_term_id, the schema contains:

cell_type_ontology_term_id

Key cell_type_ontology_term_id
Annotator Curator
Value categorical with str categories. This MUST be a CL term.

And for cell_type:

cell_type

Key cell_type
Annotator CELLxGENE Discover
Value categorical with str categories. This MUST be the human-readable name assigned to the value of cell_type_ontology_term_id.

Definition of Done

Add a description to each of the elements in the schema.md file.

For example:

Is the schema.md file somehow automatically generated from the schema_definition.yaml file? I looked through the code but couldn't immediately find a script to do this.

Tasks

Detail the specific tasks that can be used to accomplish the desired changes. If detailed steps cannot be provided at this time, please file a Tech Proposal instead.

If it helps, I'm more than happy to create a PR with proposed changes to the schema.md/schema_definition.yaml files.

brianraymor commented 11 months ago

Is the schema.md file somehow automatically generated from the schema_definition.yaml file?

Hi @rcannood,

I'm the editor for the schema. It's not automagically generated.

brianraymor commented 6 months ago

Reviewed in small group. Closing.

The schema is focused on minimal normative requirements for a primary audience of the curation and engineering teams.

For these audiences, more non-normative descriptive text is not helpful. It increases review time and introduces the potential for extended bikeshedding descriptions based on varying opinions.