HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
64 stars 32 forks source link

Add gene_annotation_version field to analysis_protocol #1543

Closed arschat closed 5 months ago

arschat commented 6 months ago

For which schema is a change/update being suggested?

I would like to request an update to the analysis_protocol.json schema.

What should the change/update be?

I would like to add a new field - gene_annotation_version - to this schema to allow data contributors to record the Ensembl release version accession number or NCBI RefSeq assembly version used for gene annotation.

This update constitutes a minor change to the schema it affects.

What new field(s) need to be changed/added?

* Field name: gene_annotation_version
* Field description: The Ensembl release version accession number or NCBI RefSeq assembly version used for gene annotation.
* Field type: string
* Required: no
* Examples: v110; GCF_000001405.40; GCF_000001635.27
* CV or enum: REGEX

Why is the change requested?

This field is needed to record the specific Ensembl release version accession number or NCBI RefSeq assembly version. Is one of the upcoming Tier 1 metadata proposed from HCA Integration Teams & Bionetworks

Possible source of batch effect and confounder for some biological analysis