HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

Added alignment_software & alignment_software_version field in analysis_protocol Fixes #1533 #1534

Closed arschat closed 11 months ago

arschat commented 11 months ago

Release notes

For analysis_protocol.json schema:

Why are these changes needed?

This field is needed to record the specific software used in order to align fastq files to a reference genome. Is one of the upcoming Tier 1 metadata proposed from HCA Integration Teams & HCA Bionetworks.

arschat commented 11 months ago

@hannes-ucsc Hello Hannes! My initial thought was to accompany the aligner with the genome assembly version, and since genome assembly version field is recorded in the analysis file, I modeled it that way. I can see the pros of modeling it in the analysis protocol, and if you agree on separating alignment_software and genome_assembly_version I will proceed with the proposed change.

hannes-ucsc commented 11 months ago

and if you agree on separating alignment_software and genome_assembly_version I will proceed with the proposed change

Did you mean "alignment_software and alignment_software_version"?

And with "proposed change" you mean moving alignment_software and alignment_software_version to analysis_protocol schema?

arschat commented 11 months ago

Exactly. If you agree on separating alignment_software and alignment_software_version from genome_assembly_version I will proceed with moving the two new fields in the analysis_protocol type. Sorry for not being clear before.

hannes-ucsc commented 11 months ago

Great, that's the approach I find most reasonable.

hannes-ucsc commented 11 months ago

Note https://github.com/HumanCellAtlas/dcp2/blob/main/docs/dcp2_operating_procedures.rst step 9

  1. Start another cycle by requesting a review from the reviewers currently on the PR, even those that already approved the PR or just commented on it. Proceed to step 3.
arschat commented 11 months ago

Applied dependencies keyword instead of dependentRequired as discussed above, and fixed update_log.csv. Please review!