cidgoh / GRDI_AMR_One_Health

A data specification for harmonizing One Health AMR pathogen genomics contextual data. The specification provides standardized (ontology-based) fields and terms which are implemented via a spreadsheet collection template, supported by field and reference guides as well as different curation and new term request SOPs.
MIT License
7 stars 0 forks source link

11.1.1 - release note tracking #81

Closed cbarcl01 closed 3 months ago

cbarcl01 commented 5 months ago

Template Fixes:

Specification Changes:

Field Change
experimental _protocol_field New field
experimental_specimen_role _type New field, new picklist IDs
nucleic acid extraction method New field
nucleic acid extraction kit New field
sample_volume_measurement_value New field
sample_volume_measurement_unit New field, new picklist IDs
residual_sample_status New field, new picklist IDs
sample_storage_duration_value New field
sample_storage_duration_unit New field, new picklist IDs
nucleic_acid_storage_duration_value New field
nucleic_acid_storage_duration_unit New field, new picklist IDs
DNA fragment length New field
genomic target enrichment method New field, new picklist IDs
genomic target enrichment method details New field
amplicon pcr primer scheme New field
amplicon size New field
sequencing flow cell version New field
quality control method name New field
quality control method version New field
quality control determination New field, new picklist IDs
quality control issues New field , new picklist IDs
quality control details New field
raw sequence data processing method New field
dehosting method New field
sequence assembly software name New field
sequence assembly software version New field
consensus sequence software name New field
consensus sequence software version New field
breadth of coverage value New field
depth of coverage value New field
depth of coverage threshold New field
genome completeness New field
number of base pairs sequenced New field
number of total reads New field
number of unique reads New field
minimum post-trimming read length New field
number of contigs New field
percent Ns across total genome length New field
Ns per 100 kbp New field
N50 New field
percent read contamination New field
sequence assembly length New field
consensus genome length New field
reference genome accession New field
deduplication method New field
bioinformatics protocol New field
read mapping software name New field
read mapping software version New field
taxonomic reference database name New field
taxonomic reference database version New field
taxonomic analysis report filename New field
taxonomic analysis date New field
read mapping criteria New field

Version Tracking:

Excel Template, Reference Guides, Curation SOP

11 = # = New fields added to support bioinformatics and taxonomic identification 1 = # = New picklist values 1 = # = Changes to structure by creating new modules for environmenal conditions and measurements

New Term SOP N/A unless indicated.

Template To-Dos

cbarcl01 commented 5 months ago

Structural changes to the template. Now include modules/sections:

cbarcl01 commented 5 months ago

No new fields have been added for Environmental conditions and measurements, however the following fields have been pulled into this section which were initially in Sample Collection and Processing:

Field | ID -- | -- water_depth | GENEPIO:0100440 water_depth_units | GENEPIO:0101025 sediment_depth | GENEPIO:0100697 sediment_depth_units | GENEPIO:0101026 air_temperature | GENEPIO:0100441 air_temperature_units | GENEPIO:0101027 water_temperature | GENEPIO:0100698 water_temperature_units | GENEPIO:0101028 weather_type | GENEPIO:0100442
cbarcl01 commented 5 months ago

The following new fields have been added to Sample collection and processing

Field ID
experimental _protocol_field GENEPIO:0101029
experimental_specimen_role _type GENEPIO:0100921
nucleic acid extraction method GENEPIO:0100939
nucleic acid extraction kit GENEPIO:0100772
sample_volume_measurement_value GENEPIO:0100768
sample_volume_measurement_unit GENEPIO:0100769
residual_sample_status GENEPIO:0101090
sample_storage_duration_value GENEPIO:0101014
sample_storage_duration_unit GENEPIO:0101015
nucleic_acid_storage_duration_value GENEPIO:0101085
nucleic_acid_storage_duration_unit GENEPIO:0101086
cbarcl01 commented 5 months ago

New sample collection and processing field picklist values:

experimental_specimen_role _type ID
Positive experimental control GENEPIO:0101018
Negative experimental control GENEPIO:0101019
Technical replicate EFO:0002090
Biological replicate EFO:0002091
residual_sample_status ID
Residual sample remaining (some sample left) GENEPIO:0101087
No residual sample (sample all used) GENEPIO:0101088
Residual sample status unkown GENEPIO:0101089
sample_volume_measurement_unit ID
microliter (uL) UO:0000101
milliliter (mL) UO:0000098
liter (L) UO:0000099
sample_storage_duration_unit ID
Second UO:0000010
Minute UO:0000031
Hour UO:0000032
Day UO:0000033
Week UO:0000034
Month UO:0000035
Year UO:0000036
nucleic_acid_storage_duration_unit ID
Second UO:0000010
Minute UO:0000031
Hour UO:0000032
Day UO:0000033
Week UO:0000034
Month UO:0000035
Year UO:0000036
cbarcl01 commented 5 months ago

New sequence information field picklist values:

genomic target enrichment method ID
Hybrid selection (bait-capture) GENEPIO:0001950
rRNA depletion GENEPIO:0101020
cbarcl01 commented 5 months ago

New Bioinformatics and QC metrics field picklist values:

quality control determination ID
No quality control issues identified GENEPIO:0100562
Sequence passed quality control GENEPIO:0100563
Sequence failed quality control GENEPIO:0100564
Minor quality control issues identified GENEPIO:0100565
Sequence flagged for potential quality control issues GENEPIO:0100566
Quality control not performed GENEPIO:0100567
quality control issues ID
Low quality sequence GENEPIO:0100568
Sequence contaminated GENEPIO:0100569
Low average genome coverage GENEPIO:0100570
Low percent genome captured GENEPIO:0100571
Read lengths shorter than expected GENEPIO:0100572
Sequence amplification artifacts GENEPIO:0100573
Low signal to noise ratio GENEPIO:0100574
Low coverage of characteristic mutations GENEPIO:0100575