HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

Add alignment_software & alignment_software_version field in analysis_protocol #1533

Closed arschat closed 11 months ago

arschat commented 12 months ago

For which schema is a change/update being suggested? I would like to request an update to the analysis_file.json analysis_protocol.json schema.

What should the change/update be? I would like to add two new fields

  1. alignment_software to allow data contributors to record the software used for the alignment of the FASTQ files' reads to a reference genome
  2. alignment_software_version to allow data contributors to record the version of the software used for the alignment of the FASTQ files' reads to a reference genome This update constitutes a major change to the schema(s) it affects.

What new field(s) need to be changed/added?

* Field name: alignment_software
* Field description: Name of alignment software used to map the FASTQ files to the reference genome
* Field type: string
* Required: yes 
* Examples: cell ranger; kallisto bustools; GSNAP; STAR; Not Applicable
* CV or enum: no
* Field name: alignment_software_version
* Field description: Version of alignment software used to map the FASTQ files to the reference genome
* Field type: string
* Required: yes
* Examples: v1.0.1; 2.4.2a; Not Applicable
* CV or enum: no

Why is the change requested?

This field is needed to record the specific software used in order to align fastq files to a reference genome. Is one of the upcoming Tier 1 metadata proposed from HCA Integration Teams & Bionetwork

Affects which cells are filtered per dataset, and which reads (introns and exons or only exons) are counted as part of the reported transcriptome. This can convey batch effects.

arschat commented 11 months ago

PR HumanCellAtlas/metadata-schema#1534

arschat commented 11 months ago

alignment_software & alignment_software_version moved to analysis_protocol schema instead of analysis_file.