Add alignment_software & alignment_software_version field in analysis_protocol

arschat commented 12 months ago

For which schema is a change/update being suggested? I would like to request an update to the ~~analysis_file.json~~ analysis_protocol.json schema.

What should the change/update be? I would like to add two new fields

alignment_software to allow data contributors to record the software used for the alignment of the FASTQ files' reads to a reference genome
alignment_software_version to allow data contributors to record the version of the software used for the alignment of the FASTQ files' reads to a reference genome This update constitutes a major change to the schema(s) it affects.

What new field(s) need to be changed/added?

* Field name: alignment_software
* Field description: Name of alignment software used to map the FASTQ files to the reference genome
* Field type: string
* Required: yes 
* Examples: cell ranger; kallisto bustools; GSNAP; STAR; Not Applicable
* CV or enum: no

* Field name: alignment_software_version
* Field description: Version of alignment software used to map the FASTQ files to the reference genome
* Field type: string
* Required: yes
* Examples: v1.0.1; 2.4.2a; Not Applicable
* CV or enum: no

Why is the change requested?

This field is needed to record the specific software used in order to align fastq files to a reference genome. Is one of the upcoming Tier 1 metadata proposed from HCA Integration Teams & Bionetwork

Affects which cells are filtered per dataset, and which reads (introns and exons or only exons) are counted as part of the reported transcriptome. This can convey batch effects.

arschat commented 11 months ago

PR HumanCellAtlas/metadata-schema#1534

arschat commented 11 months ago

alignment_software & alignment_software_version moved to analysis_protocol schema instead of analysis_file.

HumanCellAtlas / metadata-schema

Add alignment_software & alignment_software_version field in analysis_protocol #1533